SpeechDat Catalan FDB database

Full Official Name: SpeechDat Catalan FDB database
Submission date: Jan. 24, 2014, 4:31 p.m.

The SpeechDat Catalan FDB database contains the recordings of 1,005 Catalan speakers (474 males, 531 females) recorded over the Spanish fixed telephone network. The database is partitioned into 4 CD-ROMs, in ISO 9660 format. Speech samples are stored as sequences of 8-bit 8 kHz A-law, uncompressed. Each prompted utterance is stored in a separate file, and each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information. Each speaker uttered the following items: - 3 application words - 1 sequence of 10 isolated digits - 4 connected digits (prompt sheet number -6 digits, telephone number –9/11 digits, credit card number –14/16 digits, PIN code –6 digits) - 3 dates (spontaneous date e.g. birthday, prompted date, relative and general date expression) - 1 word spotting phrase using embedded application words - 1 isolated digit - 3 spelled words (1 surname, 1 directory assistance city name, 1 real/artificial name for coverage) - 1 currency money amount - 1 natural number - 5 directory assistance names (1 spontaneous, e.g. own surname, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 “forename surname” out of a set of 150 ) - 2 yes/no questions (1 predominantly “yes” question, 1 predominantly “no” question, including fuzzy questions) - 9 phonetically rich sentences - 2 time phrases (1 spontaneous time of day, 1 word style time phrase) - 4 phonetically rich words The following age distribution has been obtained: 13 speakers are under 16, 473 are between 16 and 30, 286 are between 31 and 45, 192 are between 46 and 60, and 41 speakers are over 60. A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

Creator(s)
Distributor(s)
Right Holder(s)