Resource: Polish Speech Database

Reference Polish Speech Database
Date of Submission Oct. 16, 2019, 6:23 p.m.
Status accepted
ISLRN 803-554-461-385-1
Resource Type Primary Text
Media Type Text, Audio
Language Polish
Format/MIME Type audio/x-flac, text/plain
Size 18666949 KB
Access Medium Web Download


Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts.

Data collection was performed in Poland. Speakers were asked to record themselves for at least 60 minutes from their home computer using a headset while reading text on a website. The text was comprised of sentences covering most speech sounds in Polish.

The database includes speaker metadata. There were 103 male speakers and 97 female speakers. Their ages ranged from 15 years to 60 years of age. Most were in the 15-30 years age range.


Speech data is presented as 16,000 Hz, 16-bit, single channel, flac compressed wav files. Transcripts are UTF-8 encoded plain text.

Version 1.0
Creator Tomasz Szwelnik , Jacek Kawalec , Dorota Gutowska
Distributor Linguistic Data Consortium
Rights Holder Portions © 2019, © 2019 Trustees of the University of Pennsylvania