MLInitiative@WILDRE7: Hybrid Approaches with Large Language Models for Enhanced Sentiment Analysis in Code-Switched and Code-Mixed Texts

Hariram Veeramani, Surendrabikram Thapa, Usman Naseem


Abstract
Code-switched and code-mixed languages are prevalent in multilingual societies, reflecting the complex interplay of cultures and languages in daily communication. Understanding the sentiment embedded in such texts is crucial for a range of applications, from improving social media analytics to enhancing customer feedback systems. Despite their significance, research in code-mixed and code-switched languages remains limited, particularly in less-resourced languages. This scarcity of research creates a gap in natural language processing (NLP) technologies, hindering their ability to accurately interpret the rich linguistic diversity of global communications. To bridge this gap, this paper presents a novel methodology for sentiment analysis in code-mixed and code-switched texts. Our approach combines the power of large language models (LLMs) and the versatility of the multilingual BERT (mBERT) framework to effectively process and analyze sentiments in multilingual data. By decomposing code-mixed texts into their constituent languages, employing mBERT for named entity recognition (NER) and sentiment label prediction, and integrating these insights into a decision-making LLM, we provide a comprehensive framework for understanding sentiment in complex linguistic contexts. Our system achieves competitive rank on all subtasks in the Code-mixed Less-Resourced Sentiment analysis (Code-mixed) shared task at WILDRE-7 (LREC-COLING).
Anthology ID:
2024.wildre-1.10
Volume:
Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Girish Nath Jha, Sobha L., Kalika Bali, Atul Kr. Ojha
Venues:
WILDRE | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
66–72
Language:
URL:
https://aclanthology.org/2024.wildre-1.10
DOI:
Bibkey:
Cite (ACL):
Hariram Veeramani, Surendrabikram Thapa, and Usman Naseem. 2024. MLInitiative@WILDRE7: Hybrid Approaches with Large Language Models for Enhanced Sentiment Analysis in Code-Switched and Code-Mixed Texts. In Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation, pages 66–72, Torino, Italia. ELRA and ICCL.
Cite (Informal):
MLInitiative@WILDRE7: Hybrid Approaches with Large Language Models for Enhanced Sentiment Analysis in Code-Switched and Code-Mixed Texts (Veeramani et al., WILDRE-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wildre-1.10.pdf