Benchmarking Large Language Models on Bangla Dialect Translation and Dialectal Sentiment Analysis

Md Mahir Jawad; Rafid Ahmed; Ishita Sur Apan; Tasnimul Hossain Tomal; Fabiha Haider; Mir Sazzat Hossain; Md Farhad Alam Bhuiyan

Benchmarking Large Language Models on Bangla Dialect Translation and Dialectal Sentiment Analysis

Md Mahir Jawad, Rafid Ahmed, Ishita Sur Apan, Tasnimul Hossain Tomal, Fabiha Haider, Mir Sazzat Hossain, Md Farhad Alam Bhuiyan

Abstract

We present a novel Bangla Dialect Dataset comprising 600 annotated instances across four major dialects: Chattogram, Barishal, Sylhet, and Noakhali. The dataset was constructed from YouTube comments spanning diverse domains to capture authentic dialectal variations in informal online communication. Each instance includes the original dialectical text, its standard Bangla translation, and sentiment labels (Positive and Negative). We benchmark several state-of-the-art large language models on dialect-to-standard translation and sentiment analysis tasks using zero-shot and few-shot prompting strategies. Our experiments reveal that transliteration significantly improves translation quality for closed-source models, with GPT-4o-mini achieving the highest BLEU score of 0.343 in zero-shot with transliteration. For sentiment analysis, GPT-4o-mini demonstrates perfect precision, recall, and F1 scores (1.000) in few-shot settings. This dataset addresses the critical gap in resources for low-resource Bangla dialects and provides a foundation for developing dialect-aware NLP systems.

Anthology ID:: 2025.banglalp-1.26
Volume:: Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Naeemul Hassan, Enamul Hoque Prince, Mohiuddin Tasnim, Md Rashad Al Hasan Rony, Md Tahmid Rahman Rahman
Venues:: BanglaLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 322–337
Language:
URL:: https://aclanthology.org/2025.banglalp-1.26/
DOI:
Bibkey:
Cite (ACL):: Md Mahir Jawad, Rafid Ahmed, Ishita Sur Apan, Tasnimul Hossain Tomal, Fabiha Haider, Mir Sazzat Hossain, and Md Farhad Alam Bhuiyan. 2025. Benchmarking Large Language Models on Bangla Dialect Translation and Dialectal Sentiment Analysis. In Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), pages 322–337, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Large Language Models on Bangla Dialect Translation and Dialectal Sentiment Analysis (Jawad et al., BanglaLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.banglalp-1.26.pdf

PDF Cite Search Fix data