Danish SpeechDat(M) database - DB2

Full Official Name: Danish SpeechDat(M) database - DB2
Submission date: Jan. 24, 2014, 4:22 p.m.

The (polyphone-like) Danish SpeechDat(M) database contains the recordings of 1,523 Danish speakers from 11 regions. Speech samples are stored as sequences of 8 bit 8 kHz A-law. Each prompted utterance is stored in a separate file, and the associated label files are stored in SAM file format. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information. It was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications. The lexicon is presented in a TAB delimited ASCII file containing an alphabetically ordered list of distinct lexical items occurring in the database. Each entry contains a frequency count and corresponding pronunciation information. Example: WORD FREQUENCY PHONEMIC TRANSCRIPTIONS åbnede 104 O b n @ D | O b n @ D @ adresseangivelse 97 a d R a s @ a n g i: u l s @ The complete Danish SpeechDat database is partitioned into 5 CD-ROMs. The first three CD-ROMs contain the application oriented sub-set. The last two CD-ROMs contain the phonetically rich sentences. Each speaker uttered the following items: * 5 semi-spontaneous application word phrases * 12 connected digit strings with 8 digits * 24 natural numbers (3-4 digits) * 27 application words * 3 dates, including a spontaneous one e.g. birthday * 3 spelled words * 2 money amounts, including a small one, and a large one * 1 spontaneous city name * 3 spontaneous yes/no questions * 22-25 sentences * 2 time phrases, including a time phrase and a spontaneous time of day The 5 age groups are the following: under 16, 16-30, 31-45, 46-60, over 60. 78% of the speakers are between 16 and 60 years old. A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

Creator(s)
Distributor(s)
Right Holder(s)