SerbMR-2C

Full Official Name: The Serbian Movie Review Dataset (2 Classes)
Submission date: Feb. 19, 2016, 9:39 a.m.

This two-class balanced sentiment analysis dataset contains 1682 movie reviews in Serbian (841 positive and 841 negative reviews). All Cyrillic reviews were converted into the Latin script. A great majority of the reviews are in the Ekavian pronunciation. SerbMR-2C is a subset of the imbalanced "Collected Movie Reviews in Serbian" dataset and was constructed by including all 841 negative reviews from it and choosing 841 positive reviews in the manner described in the paper: Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset, Vuk Batanović, Boško Nikolić, Milan Milosavljević, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016). The balancing procedure minimizes the non-sentiment-related differences between the classes and takes into account review scores, review lengths and the differences in writing styles used on different source websites. SerbMR-2C is also a subset of the SerbMR-3C dataset as it contains the positive/negative reviews from that dataset, but not the neutral ones.

Creator(s)
Distributor(s)
Right Holder(s)