ATCO2 project aims at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space. This project has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union. The project collected the real-time voice communication between air-traffic controllers and pilots available either directly through publicly accessible radio frequency channels or indirectly from air-navigation service providers (ANSPs). In addition to the voice communication data, contextual information is available in a form of metadata (i.e. surveillance data). The dataset consists of two distinct packages: - A corpus of ca. 4000 hours (untranscribed) of air-traffic control speech collected across different airports (Sion, Bern, Zurich, etc.) in .wav format for speech recognition. Speaker distribution is 90/10% between males and females and the group contains native and non-native speakers of English. The raw data, also provided, consists of: Overall size of the dataset (measured after Voice activity detection) - 5281 hours (English + non-English) - 4465 hours (English only) Overall raw size of audio files (sum of wav file lengths): - 6225 hours (English + non-English) - A corpus of ca. 4 hours (transcribed) of air-traffic control speech collected across different airports (Sion, Bern, Zurich, etc.) in .wav format for speech recognition. Speaker distribution is 90/10% between males and females and the group contains native and non-native speakers of English. This corpus has been manually transcribed and automatically annotated with orthographic information in XML format with speaker noise information, SNR values and others. Ca. 1 hour of annotation has followed a human re-checking.