ArabLEX: Database of Arabic General Vocabulary (DAG)

Full Official Name: ArabLEX: Database of Arabic General Vocabulary (DAG)
Submission date: April 7, 2022, 2:49 p.m.

This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107. A comprehensive full-form lexicon of Arabic general vocabulary including all inflected, conjugated and cliticized forms. Each entry is accompanied by a rich set of morphological, grammatical, and phonological attributes. Ideally suited for NLP applications, DAG provides precise phonemic transcriptions and full vowel diacritics designed to enhance Arabic speech technology. Note that proper nouns are in principle excluded since they are included in other ArabLEX modules. This database is provided with 3 options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the 3 proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications. Quantity and size: 87,930,738 lines / 24,399 MB (23.8 GB) File format: flat TSV text files

Creator(s)
Distributor(s)
Right Holder(s)