Danish SpeechDat(M) database - DB1

The Danish SpeechDat(M) database is the speech database collected within the SpeechDat(M) project. It consists ofpolyphone-like data recorded by 1,523 speakers. The speech files are stored as sequences of 8 bit 8 kHz A-law samples. Each prompted utterance is stored within a separatefile and the associated label files are stored in SAM file format. An ASCII file is attached and is listing information about each speaker: speaker code, sex, age, region, prompt number. The lexicon is presented in a TAB delimited ASCII file containing an alphabetically ordered list of distinct lexical itemsoccurring in the database. Each entry contains a frequency count and corresponding pronunciation information. Example: WORD FREQUENCY PHONEMIC TRANSCRIPTIONS åbnede 104 O b n @ D | O b n @ D @ adresseangivelse 97 a d R a s @ a n g i: u l s @ The complete Danish SpeechDat database consists of 5 CD-ROMs. The first three CD-ROMs contain the application oriented sub-set. The last two CD-ROMs contain the phonetically rich sentences. The included items are: · 5 application word phrases (semi spontaneous) · 12 connected digit strings with 8 digits · 24 natural numbers (3-4 digits) · 27 application words · 3 dates, D3 spontaneous (birthday) · 3 spelled words · 2 money amounts, M1 small, M2 large · City name (spontaneous) · 3 yes/no questions (spontaneous) · 22-25 sentences · T1 time phrase, T2 time of day (spontaneous) There are 1,523 speakers in the SpeechDat database from 11 linguistic regions of Denmark and five age groups (under 16, 16-30, 31-45, 46-60, over 60). 78% of them are between 16 and 60 years old. A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

