Resource: FAME Speech Corpus

Reference The FAME! Speech Corpus
Date of Submission Dec. 16, 2016, 2:29 p.m.
Status accepted
ISLRN 340-994-352-616-4
Resource Type Speech
Media Type Audio
Source
Language Dutch, Frisian
Format/MIME Type sound/wav
Size 17 hrs
Access Medium download
Description

The Corpus consists of 203 audio segments of approximately 5 minutes long extracted from various radio programs covering a time span of almost 50 years (1966-2015), adding a longitudinal dimension to the database.
The content of the recordings are very diverse including radio programs about culture, history, literature, sports, nature, agriculture, politics, society and languages.

The total duration of the manually annotated radio broadcasts sums up to 18 hours, 33 minutes and 57 seconds. The stereo audio data has a sampling frequency of 48 kHz and 16-bit resolution per sample. The available meta-information helped the annotators to identify these speakers and mark them either using their names or the same label (if the name is not known). There are 309 identified speakers in the FAME! Speech Corpus, 21 of whom appear at least 3 times in the database. These speakers are mostly program presenters and celebrities appearing multiple times in different recordings over years. There are 233 unidentified speakers due to lack of meta-information. The total number of word- and sentence-level code-switching cases in the FAME! Speech Corpus is equal to 3837.

Music portions have been replaced by noise, except where these overlap with speech.

Version 1.0
Creator Henk van den Heuvel - Radboud University
Distributor ELRA , Henk van den Heuvel - Radboud University
Rights Holder Jouke Algra - Omrop Fryslân , Hanno Brand - Fryske Akademy , Margot van Mulken - Radboud University