EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts

Khondoker Ittehadul Islam, Tanvir Yuvraz, Md Saiful Islam, Enamul Hassan


Abstract
For low-resourced Bangla language, works on detecting emotions on textual data suffer from size and cross-domain adaptability. In our paper, we propose a manually annotated dataset of 22,698 Bangla public comments from social media sites covering 12 different domains such as Personal, Politics, and Health, labeled for 6 fine-grained emotion categories of the Junto Emotion Wheel. We invest efforts in the data preparation to 1) preserve the linguistic richness and 2) challenge any classification model. Our experiments to develop a benchmark classification system show that random baselines perform better than neural networks and pre-trained language models as hand-crafted features provide superior performance.
Anthology ID:
2022.aacl-short.17
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
128–134
Language:
URL:
https://aclanthology.org/2022.aacl-short.17
DOI:
Bibkey:
Cite (ACL):
Khondoker Ittehadul Islam, Tanvir Yuvraz, Md Saiful Islam, and Enamul Hassan. 2022. EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 128–134, Online only. Association for Computational Linguistics.
Cite (Informal):
EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts (Islam et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-short.17.pdf
Dataset:
 2022.aacl-short.17.Dataset.zip
Software:
 2022.aacl-short.17.Software.zip