Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Sankalp Bahad, Pruthwik Mishra, Parameswari Krishnamurthy, Dipti Sharma


Abstract
Named Entity Recognition (NER) is a use-ful component in Natural Language Process-ing (NLP) applications. It is used in varioustasks such as Machine Translation, Summa-rization, Information Retrieval, and Question-Answering systems. The research on NER iscentered around English and some other ma-jor languages, whereas limited attention hasbeen given to Indian languages. We analyze thechallenges and propose techniques that can betailored for Multilingual Named Entity Recog-nition for Indian Languages. We present a hu-man annotated named entity corpora of ∼40Ksentences for 4 Indian languages from two ofthe major Indian language families. Addition-ally, we show the transfer learning capabilitiesof pre-trained transformer models from a highresource language to multiple low resource lan-guages through a series of experiments. Wealso present a multilingual model fine-tunedon our dataset, which achieves an F1 score of∼0.80 on our dataset on average. We achievecomparable performance on completely unseenbenchmark datasets for Indian languages whichaffirms the usability of our model.
Anthology ID:
2024.naacl-srw.9
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
75–82
Language:
URL:
https://aclanthology.org/2024.naacl-srw.9
DOI:
10.18653/v1/2024.naacl-srw.9
Bibkey:
Cite (ACL):
Sankalp Bahad, Pruthwik Mishra, Parameswari Krishnamurthy, and Dipti Sharma. 2024. Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 75–82, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages (Bahad et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-srw.9.pdf