ISLRN

Chinese Kids Speech database (Lower Grade)

Full Official Name: Chinese Kids Speech database (Lower Grade)

Submission date: July 22, 2025, 5:16 p.m.

The Chinese Kids Speech database (Lower Grade) contains the total recordings of 184 Chinese Kids speakers (98 males and 86 females), from 6 to 10 years’ old recorded in quiet rooms using smartphone. This database may be combined with the Chinese Kids Speech database (Upper Grade) also available in the ELRA Catalogue under reference ELRA-S0497. Number of speakers, utterances, duration and age are as follows : Number of speakers (Male/Female): 184 (98/86) Number of utterances (average): 237 utt/spkr Total number of utterances: 43,667 Age: from 6 to 10 Total hours of data: 87 1,426 sentences were used. Recordings were made through smartphones and audio data stored in .wav files as sequences of 16KHz Mono, 16 bits, Linear PCM. Database ・Audio data: WAV format, 16KHz, 16bit, mono (recorded with smartphone) ・Transcription data: TSV format(tab-delimited), UTF-8 (without BOM) ), Line ending: LF ・Size: 9.4GB Age Male Female Total 6 11 6 17 7 11 8 19 8 18 29 47 9 47 36 83 10 11 7 18 Structure of database : ├─ readme.txt ├─ Chinese Kids Speech Database (Lower grade).pdf Description document of the database ├─ transcription(Lower).tsv Transcription └─ Low/ directory of audio data └─ (1st/2nd/3rd) directory of version ID └─(0/1) directory of gender (0: male, 1: female) └─(audio_file) audio file (WAV format, 16KHz, 16bit, mono) Field information of “transcription(Lower).tsv” are as follows: Field number Contents 0 Script ID 1 Speaker ID 2 Audio file name 3 Transcription (in Chinese) File naming conventions of audio files are as follows: Field number Contents Description Remarks 0 Script ID Four digits XXXX: four digits 1 Speaker ID Three digits XXX: three digits 2 Age Two digits From 06 to 10 3 Gender 0: male, 1: female 4 Utterance No. Three digits Sequential numbering starting from 001 within each speaker 5 Recording date YYYYMMDDHHMM 6 Recording device name Recording device name Ex. NTH-AN00 7 OS Operating System info of recording device Ex. android-11 8 Duration duration in msec Duration of the actual spoken utterance Filed separation character is “_”. For example, if the audio file name is “1318_373_09_1_010_202205041857_NTH-AN00_android-11_5480.wav “, this file has the following meaning: 1318: script ID 373: speaker ID 09: age (nine years old) 1: gender (female) 010: utterance number 202205041857: recording date (May 4, 2022, at 6:57 PM) NTH-AN00: recording device name android-11: operating system info of recording device 5480: duration of the actual spoken utterance (5,480 msec)

Creator(s)

Distributor(s)