MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi

Anuj Kumar; Mohammed Faisal Sayed; Satyadev Ahlawat; Yamuna Prasad

doi:10.18653/v1/2025.findings-emnlp.712

MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi

Anuj Kumar, Mohammed Faisal Sayed, Satyadev Ahlawat, Yamuna Prasad

Abstract

Marathi, the third most widely spoken language in India with over 83 million native speakers, remains significantly underrepresented in Natural Language Processing (NLP) research. While sentiment analysis has achieved substantial progress in high-resource languages such as English, Chinese, and Hindi, available Marathi datasets are limited to coarse sentiment labels and lack fine-grained emotional categorization or interpretability through explanations. To address this gap, we present a new annotated dataset of 10,762 Marathi sentences, each labeled with sentiment (positive, negative, or neutral), emotion (joy, anger, surprise, disgust, sadness, fear, or neutral), and a corresponding natural language justification. Justifications are written in English and generated using GPT-4 under a human-in-the-loop framework to ensure label fidelity and contextual alignment. Extensive experiments with both classical and transformer-based models demonstrate the effectiveness of the dataset for interpretable affective computing in a low-resource language setting, offering a benchmark for future research in multilingual and explainable NLP.

Anthology ID:: 2025.findings-emnlp.712
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13234–13243
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.712/
DOI:: 10.18653/v1/2025.findings-emnlp.712
Bibkey:
Cite (ACL):: Anuj Kumar, Mohammed Faisal Sayed, Satyadev Ahlawat, and Yamuna Prasad. 2025. MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 13234–13243, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi (Kumar et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.712.pdf
Checklist:: 2025.findings-emnlp.712.checklist.pdf

PDF Cite Search Checklist Fix data