Resource: Persian Speech Corpus

Date of Submission July 5, 2017, 2:52 p.m.
Status accepted
ISLRN 068-845-898-304-0
Resource Type Other
Media Type Audio
Language Persian
Access Medium Downloadable

This about 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in Persian (Tehrani accent) by one male speaker using a professional studio, through a "Blubbery" model microphone of "Blue" brand with "Presonus Studio Channel´┐Ż as preamp and compressor. It has been recorded by "Reaper" software, and some plugins for enhancing his voice. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

This package includes:
- 399 .wav files containing spoken utterances.
- 399 .lab files containing phonetic utterances.
- 399 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software (see
- aligned.mlf which contains the HTS friendly alignments.
- orthographic transcriptions are gathered in one single text file (orthographic-transcript.txt) which has the form "[wav_filename]" "[Orthographic Transcript]" in every line.

Persian Speech Corpus by Nawar Halabi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Version 1.0
Distributor ELRA