On the effective transfer of knowledge from English to Hindi Wikipedia

Paramita Das, Amartya Roy, Ritabrata Chakraborty, Animesh Mukherjee


Abstract
Although Wikipedia is the largest multilingual encyclopedia, it remains inherently incomplete. There is a significant disparity in the quality of content between high-resource languages (HRLs, e.g., English) and low-resource languages (LRLs, e.g., Hindi), with many LRL articles lacking adequate information. To bridge these content gaps, we propose a lightweight framework to enhance knowledge equity between English and Hindi. In case the English Wikipedia page is not up-to-date, our framework extracts relevant information from external resources readily available (such as English books), and adapts it to align with Wikipedia’s distinctive style, including its neutral point of view (NPOV) policy, using in-context learning capabilities of large language models. The adapted content is then machine-translated into Hindi for integration into the corresponding Wikipedia articles. On the other hand, if the English version is comprehensive and up-to-date, the framework directly transfers knowledge from English to Hindi. Our framework effectively generates new content for Hindi Wikipedia sections, enhancing Hindi Wikipedia articles respectively by 65% and 62% according to automatic and human judgment-based evaluations.
Anthology ID:
2025.coling-industry.39
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
453–465
Language:
URL:
https://aclanthology.org/2025.coling-industry.39/
DOI:
Bibkey:
Cite (ACL):
Paramita Das, Amartya Roy, Ritabrata Chakraborty, and Animesh Mukherjee. 2025. On the effective transfer of knowledge from English to Hindi Wikipedia. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 453–465, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
On the effective transfer of knowledge from English to Hindi Wikipedia (Das et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.39.pdf