Resource: litkey

Reference Litkey Corpus
Date of Submission July 18, 2019, 4:26 p.m.
Status accepted
ISLRN 051-051-923-439-0
Resource Type Annotated corpus
Media Type XML
Source
Language German
Format/MIME Type text/xml
Size 276 MB (approx. 210.000 tokens)
Access Medium Download
Description

TheLitkey Corpus is a richly-annotated longitudinal corpus of written texts produced by primary school children in Germany from grades 2 to 4. It has been transcribed and annotated at various linguistic levels, which include POS tags, features of the word-internal structure (phonemes, syllables, morphemes) and key orthographic features of the target words as well as a categorization of spelling errors. Comprehensive evaluations show that high accuracy was achieved on all levels, making the Litkey Corpus a useful resource for corpus-based research on literacy acquisition of German primary school children and for developing NLP tools for educational purposes. The corpus is freely available under https://www.linguistics.rub.de/litkeycorpus/.

Version 1.0
Creator Stefanie Dipper - Ruhr-Universität Bochum , Eva Belke - Ruhr-Universität Bochum , Ronja Laarmann-Quante - Ruhr-Universität Bochum
Distributor Stefanie Dipper - Ruhr-Universität Bochum
Rights Holder Stefanie Dipper - Ruhr-Universität Bochum , Eva Belke - Ruhr-Universität Bochum , Ronja Laarmann-Quante - Ruhr-Universität Bochum