SerbMR-3C

Full Official Name: The Serbian Movie Review Dataset (3 Classes)
Submission date: Feb. 19, 2016, 9:39 a.m.

This three-class balanced sentiment analysis dataset contains 2523 movie reviews in Serbian (841 positive, 841 neutral, and 841 negative reviews). All Cyrillic reviews were converted into the Latin script. A great majority of the reviews are in the Ekavian pronunciation. SerbMR-3C is a subset of the imbalanced "Collected Movie Reviews in Serbian" dataset and was constructed by including all 841 negative reviews from it and choosing 841 positive and 841 neutral reviews in the manner described in the paper: Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset, Vuk Batanović, Boško Nikolić, Milan Milosavljević, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016). The balancing procedure minimizes the non-sentiment-related differences between the classes and takes into account review scores, review lengths and the differences in writing styles used on different source websites. SerbMR-3C is also a superset of the SerbMR-2C dataset as it contains all of the positive/negative reviews from that dataset, as well as the additional 841 neutral reviews.

Creator(s)
Distributor(s)
Right Holder(s)