ALLIES Corpus

Full Official Name: ALLIES Corpus
Submission date: May 5, 2023, 11 a.m.

The ALLIES Corpus was produced within the European CHIST-Era project ALLIES. The ALLIES project enabled to carry out a campaign for the evaluation of Broadcast News across time diarization systems using French data. This project is an extension of the previous ESTER, REPERE and ETAPE evaluation campaigns that were carried out for the French language in this field. This corpus is based on the material that was used for the ESTER, REPERE and ETAPE evaluation packages (see ELRA Catalogue: http://catalogue.elra.info for respective packages). The ALLIES corpus was built as an extension of the previous produced corpora. It contains corrected annotations from the previous evaluation materials as well as new audio data with corresponding transcriptions. Corrections include corrected names of speakers and re-segmentation. The segmentation tasks consist of segmentation in sound events, speaker tracking and speaker segmentation, detailed as follows: - For the sound event segmentation, the task consists of tracking the parts which contain music (with or without speech) and the parts which contain speech (with or without music). - The speaker tracking task consists in detecting the parts of the document that correspond to a given speaker. - The speaker segmentation consists of segmenting the document in speakers and grouping the parts spoken by the same speaker. Overall, the ALLIES Corpus contains about 900 hours of news broadcast, including orthographic transcriptions, speaker annotations and segmentation.

Creator(s)
Distributor(s)
Right Holder(s)