Full Official Name: The Serbian Cross-Level Semantic Similarity News Corpus
Submission date: April 28, 2022, 12:13 a.m.

The Serbian CLSS News Corpus consists of 1000 phrase-sentence and 1000 sentence-paragraph pairs in Serbian gathered from news sources on the web. Each sentence pair was manually annotated with fine-grained semantic similarity scores on the 0-4 scale. The final scores were obtained by averaging the individual scores of five annotators. A more detailed description of the corpus is available on its webpage, as well as in the following reference paper: Cross-Level Semantic Similarity for Serbian Newswire Texts, Vuk Batanović, Maja Miličević Petrović, in Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France (2022).

