TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data

Full Official Name: TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data
Submission date: April 6, 2016, 4:51 p.m.

This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011, 18/01/2011 and 19/01/2011. The content has also been translated into French (see ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test set). Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as a test set for an internal MT evaluation campaign.

Creator(s)
Distributor(s)
Right Holder(s)