Reference Corpus of Early New High German (1350–1650)

Full Official Name: Reference Corpus of Early New High German (1350–1650)
Submission date: Nov. 17, 2021, noon

The Reference Corpus of Early New High German (1350–1650) consists of approx. 3.8 million tokens, providing a careful selection of Early New High German texts from 1350 to 1650. The corpus was compiled in the context of a series of projects at the Universities of Bonn, Bochum, Halle and Potsdam, beginning in the 1970s. The corpus is composed of three sub-corpora: ReF.RUB, ReF.MLU, ReF.UP. For detailed documentation, see https://www.linguistics.ruhr-uni-bochum.de/ref. ReF.RUB and ReF.MLU use transcriptions that comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with part-of-speech tags (using the HiTS tagset), morphology and lemmas; parts of the texts have been annotated manually, the rest automatically. ReF.UP has been manually annotated with POS tags as well as with syntax structures according to the TIGER scheme. Three texts are additionally annotated with inflectional morphology. ReF.RUB and ReF.MLU can be downloaded in Cora-XML format (see https://www.linguistics.ruhr-uni-bochum.de/comphist/resources/cora); ReF.UP is provided in TIGER-XML format (see https://www.ims.uni-stuttgart.de/documents/ressourcen/werkzeuge/tigersearch/doc/html/TigerXML.html), all under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

Creator(s)
Distributor(s)
Right Holder(s)