DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter

Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, Faten Ashour


Abstract
Identifying irony in user-generated social media content has a wide range of applications; however to date Arabic content has received limited attention. To bridge this gap, this study builds a new open domain Arabic corpus annotated for irony detection. We query Twitter using irony-related hashtags to collect ironic messages, which are then manually annotated by two linguists according to our working definition of irony. Challenges which we have encountered during the annotation process reflect the inherent limitations of Twitter messages interpretation, as well as the complexity of Arabic and its dialects. Once published, our corpus will be a valuable free resource for developing open domain systems for automatic irony recognition in Arabic language and its dialects in social media text.
Anthology ID:
2020.lrec-1.768
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6265–6271
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.768
DOI:
Bibkey:
Cite (ACL):
Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, and Faten Ashour. 2020. DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6265–6271, Marseille, France. European Language Resources Association.
Cite (Informal):
DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter (Abbes et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.768.pdf