Hema Banati


pdf bib
Understanding the Sarcastic Nature of Emojis with SarcOji
Vandita Grover | Hema Banati
Proceedings of the Fifth International Workshop on Emoji Understanding and Applications in Social Media

Identifying sarcasm is a challenging research problem owing to its highly contextual nature. Several researchers have attempted numerous mechanisms to incorporate context, linguistic aspects, and supervised and semi-supervised techniques to determine sarcasm. It has also been noted that emojis in a text may also hold key indicators of sarcasm. However, the availability of sarcasm datasets with emojis is scarce. This makes it challenging to effectively study the sarcastic nature of emojis. In this work, we present SarcOji which has been compiled from five publicly available sarcasm datasets. SarcOji contains labeled English texts which all have emojis. We also analyze SarcOji to determine if there is an incongruence in the polarity of text and emojis used therein. Further, emojis’ usage, occurrences, and positions in the context of sarcasm are also studied in this compiled dataset. With SarcOji we have been able to demonstrate that frequency of occurrence of an emoji and its position are strong indicators of sarcasm. SarcOji dataset is now publicly available with several derived features like sentiment scores of text and emojis, most frequent emoji, and its position in the text. Compilation of the SarcOji dataset is an initial step to enable the study of the role of emojis in communicating sarcasm. SarcOji dataset can also serve as a go-to dataset for various emoji-based sarcasm detection techniques.