Resource: Spoken Wikipedia Corpus

Reference The Spoken Wikipedia Corpus
Date of Submission Nov. 7, 2016, 11:23 a.m.
Status accepted
ISLRN 684-927-624-257-3
Resource Type Primary Text
Media Type Text, Audio
Language Dutch, English, German
Format/MIME Type text/xml, audio/ogg
Size 800 hours of read speech

The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.

Version 1.1
Creator Timo Baumann - Universität Hamburg
Distributor Timo Baumann - Universität Hamburg
Rights Holder Timo Baumann - Universität Hamburg