TRAD Pashto Broadcast News Speech Corpus

Full Official Name: TRAD Pashto Broadcast News Speech Corpus
Submission date: April 6, 2016, 4:51 p.m.

This corpus contains transcribed broadcast news recordings in Pashto. Recordings are collected from 5 sources: Ashna TV, Azadi Radio, Deewa Radio, Mashaal Radio and Shamshad TV. The corpus contains 108 hours of recordings covering more than 1,000 speakers. Transcriptions are provided together with the audio files and include about 46,000 segments and 1.1M words. Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA).

Creator(s)
Distributor(s)
Right Holder(s)