Multilingual Approaches to Sentiment Analysis of Texts in Linguistically Diverse Languages: A Case Study of Finnish, Hungarian, and Bulgarian

Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh


Abstract
This article is dedicated to the study of multilingual approaches to sentiment analysis of texts in Finnish, Hungarian, and Bulgarian. For Finnish and Hungarian, which are characterized by complex morphology and agglutinative grammar, an analysis was conducted using both traditional rule-based methods and modern machine learning techniques. In the study, BERT, XLM-R, and mBERT models were used for sentiment analysis, demonstrating high accuracy in sentiment classification. The inclusion of Bulgarian was motivated by the opportunity to compare results across languages with varying degrees of morphological complexity, which allowed for a better understanding of how these models can adapt to different linguistic structures. Datasets such as the Hungarian Emotion Corpus, FinnSentiment, and SentiFi were used to evaluate model performance. The results showed that transformer-based models, particularly BERT, XLM-R, and mBERT, significantly outperformed traditional methods, achieving high accuracy in sentiment classification tasks for all the languages studied.
Anthology ID:
2024.iwclul-1.6
Volume:
Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages
Month:
November
Year:
2024
Address:
Helsinki, Finland
Editors:
Mika Hämäläinen, Flammie Pirinen, Melany Macias, Mario Crespo Avila
Venue:
IWCLUL
SIG:
SIGUR
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–58
Language:
URL:
https://aclanthology.org/2024.iwclul-1.6
DOI:
Bibkey:
Cite (ACL):
Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, and Alexander Gelbukh. 2024. Multilingual Approaches to Sentiment Analysis of Texts in Linguistically Diverse Languages: A Case Study of Finnish, Hungarian, and Bulgarian. In Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages, pages 49–58, Helsinki, Finland. Association for Computational Linguistics.
Cite (Informal):
Multilingual Approaches to Sentiment Analysis of Texts in Linguistically Diverse Languages: A Case Study of Finnish, Hungarian, and Bulgarian (Krasitskii et al., IWCLUL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwclul-1.6.pdf