Mihai Vlad Constantinescu
2022
RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection
Alexandra Ciobotaru
|
Mihai Vlad Constantinescu
|
Liviu P. Dinu
|
Stefan Dumitrescu
Proceedings of the Thirteenth Language Resources and Evaluation Conference
RED (Romanian Emotion Dataset) is a machine learning-based resource developed for the automatic detection of emotions in Romanian texts, containing single-label annotated tweets with one of the following emotions: joy, fear, sadness, anger and neutral. In this work, we propose REDv2, an open-source extension of RED by adding two more emotions, trust and surprise, and by widening the annotation schema so that the resulted novel dataset is multi-label. We show the overall reliability of our dataset by computing inter-annotator agreements per tweet using a formula suitable for our annotation setup and we aggregate all annotators’ opinions into two variants of ground truth, one suitable for multi-label classification and the other suitable for text regression. We propose strong baselines with two transformer models, the Romanian BERT and the multilingual XLM-Roberta model, in both categorical and regression settings.