RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian

Krenare Pireva Nuci; Paul Landes; Barbara Di Eugenio

RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian

Krenare Pireva Nuci, Paul Landes, Barbara Di Eugenio

Abstract

The education domain has been a popular area of collaboration with NLP researchers for decades. However, many recent breakthroughs, such as large transformer based language models, have provided new opportunities for solving interesting, but difficult problems. One such problem is assigning sentiment to reviews of educators’ performance. We present EduSenti: a corpus of 1,163 Albanian and 624 English reviews of educational instructor’s performance reviews annotated for sentiment, emotion and educational topic. In this work, we experiment with fine-tuning several language models on the EduSenti corpus and then compare with an Albanian masked language trained model from the last XLM-RoBERTa checkpoint. We show promising results baseline results, which include an F1 of 71.9 in Albanian and 73.8 in English. Our contributions are: (i) a sentiment analysis corpus in Albanian and English, (ii) a large Albanian corpus of crawled data useful for unsupervised training of language models, and (iii) the source code for our experiments.

Anthology ID:: 2024.lrec-main.1233
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 14146–14151
Language:
URL:: https://aclanthology.org/2024.lrec-main.1233/
DOI:
Bibkey:
Cite (ACL):: Krenare Pireva Nuci, Paul Landes, and Barbara Di Eugenio. 2024. RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14146–14151, Torino, Italia. ELRA and ICCL.
Cite (Informal):: RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian (Nuci et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1233.pdf

PDF Cite Search Fix data