Resource: GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)

Reference GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)
Date of Submission April 6, 2017, 5:50 p.m.
Status accepted
ISLRN 799-402-906-876-5
Resource Type Lexicon
Media Type Audio
Source
Language Bulgarian
Access Medium Downloadable
Description

This extended version of the Bulgarian Pronunciation Dictionary called Bulgarian-Dict260k contains pronunciations of more than 260,000 word forms. The dictionary matches in phone set and format the original GlobalPhone Bulgarian Pronunciation Dictionary (see ELRA-S0351) of 20,000 word forms. Bulgarian-Dict260k was built based on the extension of the Bulgarian GlobalPhone text database to improve language modeling and to reduce the high Out-Of-Vocabulary rate resulting from the rich morphology of the Bulgarian language. For this purpose, roughly 9 Million word tokens were collected from the internet sources of national, international, and economic news available from the online newspapers "Banker" (http://www.banker.bg/), "Kesh" (http://www.cash.bg), and �Sega" (http://www.segabg.com/). After text cleaning and normalization, all word forms were extracted. Pronunciations were created in an automatic process using hand-crafted grapheme-to-phoneme rules. The generated pronunciations were manually cross-checked by native speakers, correcting potential errors of the automatic generation.

Version 1.0
Distributor ELRA