Zoyah Afsheen Sayeed
2025
Benchmark Creation for Aspect-Based Sentiment Analysis in Low-Resource Odia Language and Evaluation through Fine-Tuning of Multilingual Models
Lipika Dewangan
|
Zoyah Afsheen Sayeed
|
Chandresh Maurya
Proceedings of the 31st International Conference on Computational Linguistics
The rapid growth of online product reviews spurs significant interest in Aspect-Based Sentiment Analysis (ABSA), which involves identifying aspect terms and their associated sentiment polarity. While ABSA is widely studied in resource-rich languages like English, Chinese, and Spanish, it remains underexplored in low-resource languages such as Odia. To address this gap, we create a reliable resource for aspect-based sentiment analysis in Odia. The dataset is annotated for two specific tasks: Aspect Term Extraction (ATE) and Aspect Polarity Classification (APC), spanning seven domains and aligned with the SemEval-2014 benchmark. Furthermore, we employ an ensemble data augmentation approach combining back-translation with a fine-tuned T5 paraphrase generation model to enhance the dataset and apply a semantic similarity filter using a Universal Sentence Encoder (USE) to remove low-quality data and ensure a balanced distribution of sample difficulty in the newly augmented dataset. Finally, we validate our dataset by fine-tuning multilingual pre-trained models, XLM-R and IndicBERT, on ATE and APC tasks. Additionally, we use three classical baseline models to evaluate the quality of the proposed dataset for these tasks. We hope the Odia dataset will spur more work for the ABSA task.