Resource: A large scale Arabic speech corpus for Automatic Speech synthesis

Reference AlKhalil Speech Corpus
Date of Submission Sept. 14, 2020, 11:25 a.m.
Status accepted
ISLRN 326-392-644-263-9
Resource Type Primary Text
Media Type Text, Audio
Language Arabic
Format/MIME Type audio/wav, text/texgrids+txt
Size 15h

Alkhalil Speech Corpus is an Arabic single speaker speech database recorded by a professional male speaker. It was designed mainly for unit-selection speech synthesis purposes. Yet, other possible applications may include end-to-end speech synthesis and speech recognition. The speech sources are paragraphs and articles that were selected thoroughly to cover different domains including science, literature, academic books, technology, etc.. The corpus includes the following files:
1- 15 .wav files presented as one channel 24 kHz 16-bit.
2- 15 .TextGrid files containing phoneme, word, and lemma-level annotations aligned with their corresponding speech utterances. These files can be opened using Praat software.
3- Orthographic-transcript.txt which contains a fully diacritized and hand-checked orthographic transcription covering more than 80.000 Arabic words.
4- buckwalter_transcript.txt which is a representation of the orthographic transcript file (3) in Buckwalter Format.
5- Pronunciation_transcript.txt which is a phonetic representation of the audio files describing the way the words were uttered by the speaker.
This file is particularly useful for unit-selection based synthesis.

Version 1.0
Creator oumaima zine - Mohammed First UNiversity , Abdelouafi Meziane - Mohammed First UNiversity
Distributor oumaima zine - Mohammed First UNiversity
Rights Holder oumaima zine - Mohammed First UNiversity , Abdelouafi Meziane - Mohammed First UNiversity