TransCasm: A Bilingual Corpus of Sarcastic Tweets

Desline Simon, Sheila Castilho, Pintu Lohar, Haithem Afli


Abstract
Sarcasm is extensively used in User Generated Content (UGC) in order to express one’s discontent, especially through blogs, forums, or social media such as Twitter. Several works have attempted to detect and analyse sarcasm in UGC. However, the lack of freely available corpora in this field makes the task even more difficult. In this work, we present “TransCasm” corpus, a parallel corpus of sarcastic tweets translated from English into French along with their non-sarcastic representations. To build the bilingual corpus of sarcasm, we select the “SIGN” corpus, a monolingual data set of sarcastic tweets and their non-sarcastic interpretations, created by (Peled and Reichart, 2017). We propose to define linguistic guidelines for developing “TransCasm” which is the first ever bilingual corpus of sarcastic tweets. In addition, we utilise “TransCasm” for building a binary sarcasm classifier in order to identify whether a tweet is sarcastic or not. Our experiment reveals that the sarcasm classifier achieves 61% accuracy on detecting sarcasm in tweets. “TransCasm” is now freely available online and is ready to be explored for further research.
Anthology ID:
2022.politicalnlp-1.14
Volume:
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Haithem Afli, Mehwish Alam, Houda Bouamor, Cristina Blasi Casagran, Colleen Boland, Sahar Ghannay
Venue:
PoliticalNLP
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
98–103
Language:
URL:
https://aclanthology.org/2022.politicalnlp-1.14
DOI:
Bibkey:
Cite (ACL):
Desline Simon, Sheila Castilho, Pintu Lohar, and Haithem Afli. 2022. TransCasm: A Bilingual Corpus of Sarcastic Tweets. In Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences, pages 98–103, Marseille, France. European Language Resources Association.
Cite (Informal):
TransCasm: A Bilingual Corpus of Sarcastic Tweets (Simon et al., PoliticalNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.politicalnlp-1.14.pdf