Resource: Reference Corpus of Middle High German (1050–1350)
|Reference||Reference Corpus of Middle High German (1050–1350)|
|Date of Submission||Dec. 19, 2016, 11:31 a.m.|
|Resource Type||Annotated corpus|
|Language||Middle High German (ca. 1050-1500)|
|Size||1.1 GB (approx. 2.5m tokens)|
The Reference Corpus of Middle High German (1050–1350) consists of more than two million tokens, providing a mostly complete collection of written records from Early Middle High German (1050–1200) as well as a careful selection of Middle High German texts from 1200 to 1350. The corpus was compiled in the context of a series of projects at the Universities of Cologne, Bonn, and Bochum, beginning in the mid-1980s.
The transcriptions of the texts comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with part-of-speech tags (using the HiTS tagset), morphology, lemmas and other information. For detailed documentation, see https://www.linguistics.ruhr-uni-bochum.de/rem.
The corpus can be downloaded in Cora-XML format (see https://www.linguistics.ruhr-uni-bochum.de/comphist/resources/cora) under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).
|Creator||Stefanie Dipper - Ruhr-Universität Bochum , Thomas Klein - Universität Bonn , Klaus-Peter Wegera - Ruhr-Universität Bochum|
|Rights Holder||Stefanie Dipper - Ruhr-Universität Bochum , Klaus-Peter Wegera - Ruhr-Universität Bochum , Thomas Klein - Universität Bonn|