Multilingual Bias Detection and Mitigation for Indian Languages

Ankita Maity, Anubhav Sharma, Rudra Dhar, Tushar Abhishek, Manish Gupta, Vasudeva Varma


Abstract
Lack of diverse perspectives causes neutrality bias in Wikipedia content leading to millions of worldwide readers getting exposed by potentially inaccurate information. Hence, neutrality bias detection and mitigation is a critical problem. Although previous studies have proposed effective solutions for English, no work exists for Indian languages. First, we contribute two large datasets, mWIKIBIAS and mWNC, covering 8 languages, for the bias detection and mitigation tasks respectively. Next, we investigate the effectiveness of popular multilingual Transformer-based models for the two tasks by modeling detection as a binary classification problem and mitigation as a style transfer problem. We make the code and data publicly available.
Anthology ID:
2024.wildre-1.4
Volume:
Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Girish Nath Jha, Sobha L., Kalika Bali, Atul Kr. Ojha
Venues:
WILDRE | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
24–29
Language:
URL:
https://aclanthology.org/2024.wildre-1.4
DOI:
Bibkey:
Cite (ACL):
Ankita Maity, Anubhav Sharma, Rudra Dhar, Tushar Abhishek, Manish Gupta, and Vasudeva Varma. 2024. Multilingual Bias Detection and Mitigation for Indian Languages. In Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation, pages 24–29, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Multilingual Bias Detection and Mitigation for Indian Languages (Maity et al., WILDRE-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wildre-1.4.pdf