TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech

Full Official Name: TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech
Submission date: Jan. 24, 2014, 4:31 p.m.

TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS). This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The recordings (not included in the present package) were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from May 2004 until May 2006. The corpus consists of 63 transcriptions files. The transcription files are stored in Transcriber XML file format. The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications. For corresponding recordings, see ELRA-S0251.

Creator(s)
Distributor(s)
Right Holder(s)