Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

Giulia Donato, Patrizia Paggio


Abstract
In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji – an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.
Anthology ID:
W17-5216
Volume:
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Alexandra Balahur, Saif M. Mohammad, Erik van der Goot
Venue:
WASSA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–126
Language:
URL:
https://aclanthology.org/W17-5216
DOI:
10.18653/v1/W17-5216
Bibkey:
Cite (ACL):
Giulia Donato and Patrizia Paggio. 2017. Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 118–126, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus (Donato & Paggio, WASSA 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5216.pdf
Data
MS COCO