This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes

Bryan Li, Samar Haider, Chris Callison-Burch


Abstract
Do the Spratly Islands belong to China, the Philippines, or Vietnam? A pretrained large language model (LLM) may answer differently if asked in the languages of each claimant country: Chinese, Tagalog, or Vietnamese. This contrasts with a multilingual human, who would likely answer consistently. In this paper, we show that LLMs recall certain geographical knowledge inconsistently when queried in different languages—a phenomenon we term geopolitical bias. As a targeted case study, we consider territorial disputes, an inherently controversial and multilingual task. We introduce BorderLines, a dataset of territorial disputes which covers 251 territories, each associated with a set of multiple-choice questions in the languages of each claimant country (49 languages in total). We also propose a suite of evaluation metrics to precisely quantify bias and consistency in responses across different languages. We then evaluate various multilingual LLMs on our dataset and metrics to probe their internal knowledge and use the proposed metrics to discover numerous inconsistencies in how these models respond in different languages. Finally, we explore several prompt modification strategies, aiming to either amplify or mitigate geopolitical bias, which highlights how brittle LLMs are and how they tailor their responses depending on cues from the interaction context. Our code and data are available at https://github.com/manestay/borderlines.
Anthology ID:
2024.naacl-long.213
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3855–3871
Language:
URL:
https://aclanthology.org/2024.naacl-long.213
DOI:
Bibkey:
Cite (ACL):
Bryan Li, Samar Haider, and Chris Callison-Burch. 2024. This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3855–3871, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes (Li et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.213.pdf
Copyright:
 2024.naacl-long.213.copyright.pdf