|Reference||The Serbian Movie Review Dataset (3 Classes)|
|Date of Submission||Feb. 19, 2016, 9:39 a.m.|
|Resource Type||Primary Text|
|Size||2523 movie reviews|
This three-class balanced sentiment analysis dataset contains 2523 movie reviews in Serbian (841 positive, 841 neutral, and 841 negative reviews). All Cyrillic reviews were converted into the Latin script. A great majority of the reviews are in the Ekavian pronunciation.
SerbMR-3C is a subset of the imbalanced "Collected Movie Reviews in Serbian" dataset and was constructed by including all 841 negative reviews from it and choosing 841 positive and 841 neutral reviews in the manner described in the paper:
Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset, Vuk Batanović, Boško Nikolić, Milan Milosavljević, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016).
The balancing procedure minimizes the non-sentiment-related differences between the classes and takes into account review scores, review lengths and the differences in writing styles used on different source websites. SerbMR-3C is also a superset of the SerbMR-2C dataset as it contains all of the positive/negative reviews from that dataset, as well as the additional 841 neutral reviews.
|Creator||Vuk Batanović - Innovation Center of the School of Electrical Engineering, University of Belgrade|
|Distributor||Vuk Batanović - Innovation Center of the School of Electrical Engineering, University of Belgrade|
|Rights Holder||Vuk Batanović - Innovation Center of the School of Electrical Engineering, University of Belgrade|