Collected Movie Reviews in Serbian

Full Official Name: Collected Movie Reviews in Serbian
Submission date: Feb. 19, 2016, 9:39 a.m.

This is a collection of 4725 movie reviews in Serbian, constructed for the task of sentiment analysis. Reviews were gathered from the following eight websites: * 2kokice.com * filmskerecenzije.com * filmskihitovi.blogspot.com * happynovisad.com * kakavfilm.com * mislitemojomglavom.blogspot.com * popboks.com * yc.rs This corpus contains the reviews published on these websites from their inception until 01/01/2015. Some of the reviews are written in Cyrillic, but most of them are in the Latin script. A great majority of the reviews are in the Ekavian pronunciation. Scoring systems A 1-10 scoring system was adopted as the standard, since most websites use it. A 1-5 scoring system, used on happynovisad.com and yc.rs, was translated to 1-10 by multiplying the original scores by two. For these websites a plus/minus next to the original score was treated as an increment/decrement of the translated score. Pluses/minuses in the 1-10 scoring systems were ignored and X.5 scores were rounded down to X. In a few rare instances where a zero score was given, it was translated into a score of one. Score distribution This dataset is skewed towards the positive reviews - if scores 1-4 are treated as negative, 5-6 as neutral, and 7-10 as positive, the corpus contains 841 negative, 1278 neutral, and 2606 positive reviews. Subsets The SerbMR-2C/SerbMR-3C datasets are two-class/three-class balanced subsets of this collection. Reference paper Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset, Vuk Batanović, Boško Nikolić, Milan Milosavljević, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016).

Creator(s)
Distributor(s)
Right Holder(s)