RED: A Novel Dataset for Romanian Emotion Detection from Tweets

Alexandra Ciobotaru, Liviu P. Dinu


Abstract
In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets. Keywords: Emotion Detection, Twitter, Romanian, Supervised Machine Learning
Anthology ID:
2021.ranlp-1.34
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
291–300
Language:
URL:
https://aclanthology.org/2021.ranlp-1.34
DOI:
Bibkey:
Cite (ACL):
Alexandra Ciobotaru and Liviu P. Dinu. 2021. RED: A Novel Dataset for Romanian Emotion Detection from Tweets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 291–300, Held Online. INCOMA Ltd..
Cite (Informal):
RED: A Novel Dataset for Romanian Emotion Detection from Tweets (Ciobotaru & Dinu, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.34.pdf
Data
ISEAR