Crowdsourcing-based Annotation of Emotions in Filipino and English Tweets
Fermin Roberto Lapitan | Riza Theresa Batista-Navarro | Eliezer Albacea
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
The automatic analysis of emotions conveyed in social media content, e.g., tweets, has many beneficial applications. In the Philippines, one of the most disaster-prone countries in the world, such methods could potentially enable first responders to make timely decisions despite the risk of data deluge. However, recognising emotions expressed in Philippine-generated tweets, which are mostly written in Filipino, English or a mix of both, is a non-trivial task. In order to facilitate the development of natural language processing (NLP) methods that will automate such type of analysis, we have built a corpus of tweets whose predominant emotions have been manually annotated by means of crowdsourcing. Defining measures ensuring that only high-quality annotations were retained, we have produced a gold standard corpus of 1,146 emotion-labelled Filipino and English tweets. We validate the value of this manually produced resource by demonstrating that an automatic emotion-prediction method based on the use of a publicly available word-emotion association lexicon was unable to reproduce the labels assigned via crowdsourcing. While we are planning to make a few extensions to the corpus in the near future, its current version has been made publicly available in order to foster the development of emotion analysis methods based on advanced Filipino and English NLP.