EthioSpeech Corpora is comprised of over 391 hours of recorded read speech in six different Ethiopian languages by ca. 200 speakers per language: Amharic (68 hours), Tigrigna (62 hours), Oromo (70 hours), Somali (56 hours), Afar (68 hours), and Sidama (68 hours). The dominating domain is media (mainly newspapers), but for some of the languages texts from different domains were used, including spiritual contents. The recording is made using mobile devices using the LIG-Aikuma speech recording tool that is installed on the devices. This project will be a valuable resource for the development of well-performing automatic speech recognition (ASR) systems for these six languages (in a monolingual setup) and for other related languages (in a multilingual and/or cross-lingual setup) that are useful in various aspects of daily life. Use cases of speech recognition systems using this dataset include dictation systems, transcription systems, assistive technologies, spoken dialogue systems, speech translation, and other similar speech technologies. To make the data set representative, the team selected six working languages that are used across regional states of Ethiopia while also maintaining the gender and age balance of readers, nearly equal for Amharic, Tigrigna and Oromo, whereas mainly male gender for the other 3 languages. The age distribution is between 18 and 40. More details are given below: - Amharic: Number of recorded sentences (only verified): 25,610 Number of speakers: 203 Recorded Speech length in hours: 68:11 - Tigrinya: Number of recorded sentences (only verified): 26,955 Number of speakers: 210 Recorded Speech length in hours: 61:42 - Oromo: Number of recorded sentences (only verified): 25,287 Number of speakers: 200 Recorded Speech length in hours: 69:57 - Somali: Number of recorded sentences (only verified): 25,175 Number of speakers: 200 Recorded Speech length in hours: 55:57 - Afar: Number of recorded sentences (only verified): 25,659 Number of speakers: 200 Recorded Speech length in hours: 67:53 - Sidama: Number of recorded sentences (only verified): 25,113 Number of speakers: 200 Recorded Speech length in hours: 67:36