SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles

Dilshod Azizov, Zain Mujahid, Hilal AlQuabeh, Preslav Nakov, Shangsong Liang


Abstract
In an era where information is quickly shared across many cultural and language contexts, the neutrality and integrity of news media are essential. Ensuring that media content remains unbiased and factual is crucial for maintaining public trust. With this in mind, we introduce SAFARI (CroSs-lingual BiAs and Factuality Detection in News MediA and News ARtIcles), a novel corpus of news media and articles for predicting political bias and the factuality of reporting in a multilingual and cross-lingual setup. To the best of our knowledge, this corpus is unprecedented in its collection and introduces a dataset for political bias and factuality for three tasks: (i) media-level, (ii) article-level, and (iii) joint modeling at the article-level. At the media and article levels, we evaluate the cross-lingual ability of the models; however, in joint modeling, we evaluate on English data. Our frameworks set a new benchmark in the cross-lingual evaluation of political bias and factuality. This is achieved through the use of various Multilingual Pre-trained Language Models (MPLMs) and Large Language Models (LLMs) coupled with ensemble learning methods.
Anthology ID:
2024.findings-emnlp.712
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12217–12231
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.712
DOI:
Bibkey:
Cite (ACL):
Dilshod Azizov, Zain Mujahid, Hilal AlQuabeh, Preslav Nakov, and Shangsong Liang. 2024. SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12217–12231, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles (Azizov et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.712.pdf