Uncovering the Limits of Text-based Emotion Detection

Nurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicenç Gómez


Abstract
Identifying emotions from text is crucial for a variety of real world tasks. We consider the two largest now-available corpora for emotion classification: GoEmotions, with 58k messages labelled by readers, and Vent, with 33M writer-labelled messages. We design a benchmark and evaluate several feature spaces and learning algorithms, including two simple yet novel models on top of BERT that outperform previous strong baselines on GoEmotions. Through an experiment with human participants, we also analyze the differences between how writers express emotions and how readers perceive them. Our results suggest that emotions expressed by writers are harder to identify than emotions that readers perceive. We share a public web interface for researchers to explore our models.
Anthology ID:
2021.findings-emnlp.219
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2560–2583
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.219
DOI:
10.18653/v1/2021.findings-emnlp.219
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.219.pdf
Code
 nur-ag/emotion-ui +  additional community code
Data
GoEmotionsVent