Bulgarian Event Corpus

Full Official Name: Bulgarian Event Corpus
The Bulgarian Event Corpus is composed 324,905 tokens appropriate for training Named Entity Recognition (NER), Named Entity Linking (NEL) and Event Recognition models for Bulgarian in a multidomain context within Humanities. The texts are domain related. They include documents from the area of Social Sciences and Humanities – scientific papers, archive documents, popular documents, and Wikipedia articles in the relevant areas. The annotation scheme reflects the rationale behind the CIDOC-CRM ontology since this ontology has been widely used in the areas of GLAM and Humanities. The annotation scheme envisages two main layers: the first one is the Named Entity (NE) layer - 16 types, and the second one is the event layer where each event is connected to its participants – 39 event labels.

