AnnoDIFP CTS Audio and Transcripts

Full Official Name: AnnoDIFP CTS Audio and Transcripts
Submission date: Nov. 7, 2025, 8:54 p.m.

Introduction AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) CTS (Conversational Telephone Speech) Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the Florida Institute of Technology (FIT), and the University of New Haven (UNH) to support algorithm development for predicting personality traits. It contains 242.52 hours of English audio and transcripts from 1,179 calls involving 327 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3). Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection. This release covers the activities in Phase 2b. The data collected in Phase 2a is contained in AnnoDIFP Session Audio and Transcripts (LDC2025S06). Data Telephone calls were collected using LDC's robot-operator platform. The operator called participants every 24 hours during their indicated availability and paired them with another participant to speak on a prompted topic for 10 minutes. Further details on collection methodology are contained in the documentation accompanying this release. There were a total of 327 participants in Phase 2a. This corpus contains audio and transcripts for 277 paticipants and transcripts only for 65 participants. Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files. Transcripts were produced automatically using the Rev.ai speech-to-text service. Text data is UTF-8 encoded. Updates No updates at this time.

Creator(s)
Distributor(s)
Right Holder(s)