Resource: Reference Corpus of Middle High German (1050–1350)

Reference Reference Corpus of Middle High German (1050–1350)
Date of Submission Dec. 19, 2016, 11:31 a.m.
Status accepted
ISLRN 332-536-136-099-5
Resource Type Annotated corpus
Media Type XML
Source
Language Middle High German (ca. 1050-1500)
Format/MIME Type text/xml
Size 1.1 GB (approx. 2.5m tokens)
Access Medium Download
Description

The Reference Corpus of Middle High German (1050–1350) consists of more than two million tokens, providing a mostly complete collection of written records from Early Middle High German (1050–1200) as well as a careful selection of Middle High German texts from 1200 to 1350. The corpus was compiled in the context of a series of projects at the Universities of Cologne, Bonn, and Bochum, beginning in the mid-1980s.

The transcriptions of the texts comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with part-of-speech tags (using the HiTS tagset), morphology, lemmas and other information. For detailed documentation, see https://www.linguistics.ruhr-uni-bochum.de/rem.

The corpus can be downloaded in Cora-XML format (see https://www.linguistics.ruhr-uni-bochum.de/comphist/resources/cora) under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

Version 1.0
Creator Stefanie Dipper - Ruhr-Universität Bochum , Thomas Klein - Universität Bonn , Klaus-Peter Wegera - Ruhr-Universität Bochum
Rights Holder Stefanie Dipper - Ruhr-Universität Bochum , Klaus-Peter Wegera - Ruhr-Universität Bochum , Thomas Klein - Universität Bonn