ISLRN

TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data

Full Official Name: TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data

Submission date: April 6, 2016, 4:51 p.m.

The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided. Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as training data for language modelling in machine translation.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

802-643-297-429-4

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1267

Resource Type

Primary Text

Media Type

Text

Language(s)

French

Pushto

Access Medium