ISLRN

SerbMR-2C

Full Official Name: The Serbian Movie Review Dataset (2 Classes)

Submission date: Feb. 19, 2016, 9:39 a.m.

This two-class balanced sentiment analysis dataset contains 1682 movie reviews in Serbian (841 positive and 841 negative reviews). All Cyrillic reviews were converted into the Latin script. A great majority of the reviews are in the Ekavian pronunciation. SerbMR-2C is a subset of the imbalanced "Collected Movie Reviews in Serbian" dataset and was constructed by including all 841 negative reviews from it and choosing 841 positive reviews in the manner described in the paper: Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset, Vuk Batanović, Boško Nikolić, Milan Milosavljević, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016). The balancing procedure minimizes the non-sentiment-related differences between the classes and takes into account review scores, review lengths and the differences in writing styles used on different source websites. SerbMR-2C is also a subset of the SerbMR-3C dataset as it contains the positive/negative reviews from that dataset, but not the neutral ones.

Creator(s)

Innovation Center of the School of Electrical Engineering, University of Belgrade - Vuk Batanović

Distributor(s)

Innovation Center of the School of Electrical Engineering, University of Belgrade - Vuk Batanović

Right Holder(s)

Innovation Center of the School of Electrical Engineering, University of Belgrade - Vuk Batanović

Status : Accepted

ISLRN :

016-049-192-514-1

Version

1.0

Source

http://vukbatanovic.github.io/SerbMR

Resource Type

Primary Text

Media Type

Text

Language(s)

Serbian

Access Medium

Website