Reem Alotibi
2026
Improving on State-of-the-Art Models for Sentiment Analysis on Saudi-English Code-Switching Text
Samaher Alghamdi | Paul Rayson | Reem Alotibi
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Samaher Alghamdi | Paul Rayson | Reem Alotibi
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Inserting English words, phrases, or sentences while writing or speaking in the Saudi Arabic dialect has become a widespread phenomenon in Saudi society. This phenomenon is linguistically called code-switching. It remains unclear how current sentiment analysis methods perform on Saudi-English code-switching text. In this paper, we address this gap by conducting the first sentiment analysis study on Saudi-English code-switching text. We present the first Saudi-English Sentiment Analysis Code Switching Dataset (SESA-CSD) and establish baseline results on this dataset. By evaluating multiple state-of-the-art small language models, we achieve improvements over the baseline of 3% to 11% in both accuracy and macro-F1. Among all small language models, XLM-RoBERTa achieved the highest performance,with an accuracy of 95.50% and a macro-F1 of 95.53%. Our findings indicate that multilingual and Arabic small language models, such as XLM-RoBERTa, GigaBERT, and SaudiBERT, consistently outperform bilingual Arabic-English large language models, such as Fanar and ALLaM, across zero-shot and multiple few-shot settings.