An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter

Sahar Omidi Shayegan; Isar Nejadgholi; Kellin Pelrine; Hao Yu; Sacha Levy; Zachary Yang; Jean-François Godbout; Reihaneh Rabbany

An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter

Sahar Omidi Shayegan, Isar Nejadgholi, Kellin Pelrine, Hao Yu, Sacha Levy, Zachary Yang, Jean-François Godbout, Reihaneh Rabbany

Abstract

Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speaking social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.

Anthology ID:: 2024.eurali-1.8
Volume:: Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Atul Kr. Ojha, Sina Ahmadi, Silvie Cinková, Theodorus Fransen, Chao-Hong Liu, John P. McCrae
Venues:: EURALI | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 51–62
Language:
URL:: https://aclanthology.org/2024.eurali-1.8
DOI:
Bibkey:
Cite (ACL):: Sahar Omidi Shayegan, Isar Nejadgholi, Kellin Pelrine, Hao Yu, Sacha Levy, Zachary Yang, Jean-François Godbout, and Reihaneh Rabbany. 2024. An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter. In Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024, pages 51–62, Torino, Italia. ELRA and ICCL.
Cite (Informal):: An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter (Omidi Shayegan et al., EURALI-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eurali-1.8.pdf

PDF Cite Search