RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection

Alexandra Ciobotaru; Mihai Vlad Constantinescu; Liviu P. Dinu; Stefan Daniel Dumitrescu

RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection

Alexandra Ciobotaru, Mihai Vlad Constantinescu, Liviu P. Dinu, Stefan Dumitrescu

Abstract

RED (Romanian Emotion Dataset) is a machine learning-based resource developed for the automatic detection of emotions in Romanian texts, containing single-label annotated tweets with one of the following emotions: joy, fear, sadness, anger and neutral. In this work, we propose REDv2, an open-source extension of RED by adding two more emotions, trust and surprise, and by widening the annotation schema so that the resulted novel dataset is multi-label. We show the overall reliability of our dataset by computing inter-annotator agreements per tweet using a formula suitable for our annotation setup and we aggregate all annotators’ opinions into two variants of ground truth, one suitable for multi-label classification and the other suitable for text regression. We propose strong baselines with two transformer models, the Romanian BERT and the multilingual XLM-Roberta model, in both categorical and regression settings.

Anthology ID:: 2022.lrec-1.149
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1392–1399
Language:
URL:: https://aclanthology.org/2022.lrec-1.149/
DOI:
Bibkey:
Cite (ACL):: Alexandra Ciobotaru, Mihai Vlad Constantinescu, Liviu P. Dinu, and Stefan Dumitrescu. 2022. RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1392–1399, Marseille, France. European Language Resources Association.
Cite (Informal):: RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection (Ciobotaru et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.149.pdf

PDF Cite Search Fix data