Diaspora Tibetan Speech

Submission date: June 20, 2024, 10:47 p.m.

Diaspora Tibetan Speech was developed at Yale University. It contains approximately 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker demographic information. Recordings were collected in 2016. All speakers were adults and varied in age as well as age of diaspora. A substantial number of speakers were born in Nepal. Each speaker contributed one recording comprising a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations. All elicitation materials are included with the corpus documentation in PDF format. The word- and number-list sections of the recordings were time aligned at the word level as Praat TextGrids. Five recordings were fully transcribed word-for-word by a native Tibetan speaker and are presented in both Microsoft Word and PDF format to preserve font encoding. The transcripts are not time-aligned but include general time stamps. Other transcripts are available as Excel spreadsheets with word-to-word correspondence of Tibetan script, phonetic transcription, and English translation. Demographic information includes age at recording, age at diaspora, and other information. The audio data is presented as single channel, 16 kHz, 16-bit wav files.

