Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods

Jinrui Yang, Fan Jiang, Timothy Baldwin


Abstract
Language fairness in multilingual information retrieval (MLIR) systems is crucial for ensuring equitable access to information across diverse languages. This paper sheds light on the issue, based on the assumption that queries in different languages, but with identical semantics, should yield equivalent ranking lists when retrieving on the same multilingual documents. We evaluate the degree of fairness using both traditional retrieval methods, and a DPR neural ranker based on mBERT and XLM-R. Additionally, we introduce ‘LaKDA’, a novel loss designed to mitigate language biases in neural MLIR approaches. Our analysis exposes intrinsic language biases in current MLIR technologies, with notable disparities across the retrieval methods, and the effectiveness of LaKDA in enhancing language fairness.
Anthology ID:
2024.mrl-1.23
Volume:
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Jonne Sälevä, Abraham Owodunni
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
280–292
Language:
URL:
https://aclanthology.org/2024.mrl-1.23
DOI:
Bibkey:
Cite (ACL):
Jinrui Yang, Fan Jiang, and Timothy Baldwin. 2024. Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 280–292, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods (Yang et al., MRL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mrl-1.23.pdf