BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Dipankar Srirag; Aditya Joshi; Jordan Painter; Diptesh Kanojia

doi:10.18653/v1/2025.findings-acl.441

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia

Abstract

Despite large language models (LLMs) being known to exhibit bias against non-mainstream varieties, there are no known labeled datasets for sentiment analysis of English. To address this gap, we introduce BESSTIE, a benchmark for sentiment and sarcasm classification for three varieties of English: Australian (en-AU), Indian (en-IN), and British (en-UK). Using web-based content from two domains, namely, Google Place reviews and Reddit comments, we collect datasets for these language varieties using two methods: location-based and topic-based filtering. Native speakers of the language varieties manually annotate the datasets with sentiment and sarcasm labels. To assess whether the dataset accurately represents these varieties, we conduct two validation steps: (a) manual annotation of language varieties and (b) automatic language variety prediction. We perform an additional annotation exercise to validate the reliance of the annotated labels. Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks. Our results reveal that the models consistently perform better on inner-circle varieties (i.e., en-AU and en-UK), with significant performance drops for en-IN, particularly in sarcasm detection. We also report challenges in cross-variety generalisation, highlighting the need for language variety-specific datasets such as ours. BESSTIE promises to be a useful evaluative benchmark for future research in equitable LLMs, specifically in terms of language varieties. The BESSTIE dataset is publicly available at: https://huggingface.co/datasets/unswnlporg/BESSTIE.

Anthology ID:: 2025.findings-acl.441
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8413–8429
Language:
URL:: https://aclanthology.org/2025.findings-acl.441/
DOI:: 10.18653/v1/2025.findings-acl.441
Bibkey:
Cite (ACL):: Dipankar Srirag, Aditya Joshi, Jordan Painter, and Diptesh Kanojia. 2025. BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8413–8429, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English (Srirag et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.441.pdf

PDF Cite Search Fix data