Resource: Spoken Wikipedia Corpus

Reference The Spoken Wikipedia Corpus
Date of Submission Nov. 7, 2016, 11:23 a.m.
Status accepted
ISLRN 684-927-624-257-3
Resource Type Primary Text
Media Type Text, Audio
Source
Language Dutch, English, German
Format/MIME Type text/xml, audio/ogg
Size 800 hours of read speech
Description

The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are – for one reason or another – unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.

Version 1.1
Creator Timo Baumann - Universität Hamburg
Distributor Timo Baumann - Universität Hamburg
Rights Holder Timo Baumann - Universität Hamburg