Ciron: a New Benchmark Dataset for Chinese Irony Detection

Rong Xiang, Xuefeng Gao, Yunfei Long, Anran Li, Emmanuele Chersoni, Qin Lu, Chu-Ren Huang


Abstract
Automatic Chinese irony detection is a challenging task, and it has a strong impact on linguistic research. However, Chinese irony detection often lacks labeled benchmark datasets. In this paper, we introduce Ciron, the first Chinese benchmark dataset available for irony detection for machine learning models. Ciron includes more than 8.7K posts, collected from Weibo, a micro blogging platform. Most importantly, Ciron is collected with no pre-conditions to ensure a much wider coverage. Evaluation on seven different machine learning classifiers proves the usefulness of Ciron as an important resource for Chinese irony detection.
Anthology ID:
2020.lrec-1.701
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5714–5720
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.701
DOI:
Bibkey:
Cite (ACL):
Rong Xiang, Xuefeng Gao, Yunfei Long, Anran Li, Emmanuele Chersoni, Qin Lu, and Chu-Ren Huang. 2020. Ciron: a New Benchmark Dataset for Chinese Irony Detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5714–5720, Marseille, France. European Language Resources Association.
Cite (Informal):
Ciron: a New Benchmark Dataset for Chinese Irony Detection (Xiang et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.701.pdf