Cross-Lingual Named Entity Recognition for Low-Resource Languages: A Hindi-Nepali Case Study Using Multilingual BERT Models

Dipendra Yadav, Sumaiya Suravee, Tobias Strauß, Kristina Yordanova


Abstract
This study investigates the potential of cross-lingual transfer learning for Named Entity Recognition (NER) between Hindi and Nepali, two languages that, despite their linguistic similarities, face significant disparities in available resources. By leveraging multilingual BERT models, including RemBERT, BERT Multilingual, MuRIL, and DistilBERT Multilingual, the research examines whether pre-training them on a resource-rich language like Hindi can enhance NER performance in a resource-constrained language like Nepali and vice versa. The study conducts experiments in both monolingual and cross-lingual settings to evaluate the models’ effectiveness in transferring linguistic knowledge between the two languages. The findings reveal that while RemBERT and MuRIL perform well in monolingual contexts—RemBERT excelling in Hindi and MuRIL in Nepali—BERT Multilingual performs comparatively best in cross-lingual scenarios, in generalizing features across the languages. Although DistilBERT Multilingual demonstrates slightly lower performance in cross-lingual tasks, it balances efficiency with competitive results. The study underscores the importance of model selection based on linguistic and resource-specific contexts, highlighting that general-purpose models like BERT Multilingual are particularly well-suited for cross-lingual applications.
Anthology ID:
2024.mrl-1.12
Volume:
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Jonne Sälevä, Abraham Owodunni
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
167–174
Language:
URL:
https://aclanthology.org/2024.mrl-1.12
DOI:
Bibkey:
Cite (ACL):
Dipendra Yadav, Sumaiya Suravee, Tobias Strauß, and Kristina Yordanova. 2024. Cross-Lingual Named Entity Recognition for Low-Resource Languages: A Hindi-Nepali Case Study Using Multilingual BERT Models. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 167–174, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Named Entity Recognition for Low-Resource Languages: A Hindi-Nepali Case Study Using Multilingual BERT Models (Yadav et al., MRL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mrl-1.12.pdf