Resource: SentiEcon GS-1000

Reference SentiEcon GS-1000
Date of Submission Dec. 9, 2019, 5:24 p.m.
Status accepted
ISLRN 524-008-163-978-0
Resource Type Gold Standard
Media Type Text
Language English
Format/MIME Type text/xml
Size 180
Access Medium Digital

In order to evaluate the performance of economy-financial domain sentiment lexicons, we present this manually annotated gold standard dataset consisting of 1,000 random sentences from a corpus of business daily-news from, the Esmeraldas Great Recession News Corpus. Then two domain experts annotated the dataset by classifying each sentence as belonging to one of three categories: POSITIVE, NEGATIVE, and NONE.

They were instructed to take into account only the informa- tion available in the sentences and to annotate sentences. Annotation was carried out independently and then they were asked to reach a consensus in differing cases. Sim- ilarly to Malo et al. (2014), our annotators were instructed to consider the following main guidelines while annotating the phrases:
• There are no fixed rules about how particular words should be annotated.
• Avoid bias based on prior knowledge about the com- pany or institution. Thus, each sample sentence should be annotated by using the information that is explicitly available.
• Be as consistent as possible with respect to your own annotations.

We decided to use sentences rather than paragraphs or full documents because it has been shown in the literature that document-level SA in the financial domain, does not generally account for the relevance of text segments, as individual sentences in finan- cial news typically focus on different aspects which may express different sentiments. For example, after analyzing 1,000 random sentences from financial announcements, Lutz et al. (2019) concluded that an accurate classification of sentences would perform more fine-grained explanatory analyses on financial texts and also improve pre-existent prediction systems. Sentence-level classification was also used in other relevant works, such as Malo et al. (2014) and Sinha et al. (2019).

The length of the news articles was also a deciding factor, since such texts are rather long articles where different concepts, events and entities are described and contrasted, which introduces a number of extraneous variables and unnecessarily complicates things for our purposes.

Version 1.0
Creator Javier Fernandez-Cruz
Distributor Javier Fernandez-Cruz
Rights Holder Javier Fernandez-Cruz