Evaluating Icelandic Sentiment Analysis Models Trained on Translated Data

Ólafur A. Jóhannsson, Birkir H. Arndal, Eysteinn Ö. Jónsson, Stefan Olafsson, Hrafn Loftsson


Abstract
We experiment with sentiment classification models for Icelandic that leverage machine-translated data for training. Since no large sentiment dataset exists for Icelandic, we translate 50,000 English IMDb reviews, classified either as positive or negative, into Icelandic using two services: Google Translate and GreynirTranslate. After machine translation, we assess whether the sentiment of the source language text is retained in the target language. Moreover, we evaluate the accuracy of the sentiment classifiers on non-translated Icelandic text.The performance of three types of baseline classifiers is compared, i.e., Support Vector Machines, Logistic Regression and Naive Bayes, when trained on translated data generated by either translation service. Furthermore, we fine-tune and evaluate three pre-trained transformer-based models, RoBERTa, IceBERT and ELECTRA, on both the original English texts and the translated texts. Our results indicate that the transformer models perform better than the baseline classifiers on all datasets. Moreover, our evaluation shows that the transformer models trained on data translated from English reviews can be used to effectively classify sentiment on non-translated Icelandic movie reviews.
Anthology ID:
2024.sigul-1.11
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
79–89
Language:
URL:
https://aclanthology.org/2024.sigul-1.11
DOI:
Bibkey:
Cite (ACL):
Ólafur A. Jóhannsson, Birkir H. Arndal, Eysteinn Ö. Jónsson, Stefan Olafsson, and Hrafn Loftsson. 2024. Evaluating Icelandic Sentiment Analysis Models Trained on Translated Data. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 79–89, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Evaluating Icelandic Sentiment Analysis Models Trained on Translated Data (Jóhannsson et al., SIGUL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigul-1.11.pdf