Mandarin Chinese Speech Synthesis Corpus (Basic Corpus)

Full Official Name: Mandarin Chinese Speech Synthesis Corpus (Basic Corpus)
Submission date: Jan. 24, 2014, 4:30 p.m.

This corpus contains the recordings of 1 native Chinese speaker (female). The corpus is composed of 20 texts with 109,227 words and has been proofread manually. The corpus contents include: phrases, digit strings, letter strings, uncommon words, neutral tone, final retroflexion, Latin alphabet, interrogative sentences, 282 English words. The speaker has been recorded in a professional recording studio over 2 channels: microphone and glottis wave (fundamental frequency) signals for a total of 18.2 hours. Speech samples are stored as sequences of 16-bit 44,1 kHz PCM on two channels. The total data size is 5.67 Gb for a total of 12,679 files. The data is encoded in GB-2312 format. The transcriptions include labels for four-class pause boundaries. This database is aimed to be used within text-to-speech and speech synthesis applications.

Creator(s)
Distributor(s)
Right Holder(s)