Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Sankalp Bahad; Pruthwik Mishra; Parameswari Krishnamurthy; Dipti Misra Sharma

doi:10.18653/v1/2024.naacl-srw.9

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Sankalp Bahad, Pruthwik Mishra, Parameswari Krishnamurthy, Dipti Sharma

Abstract

Named Entity Recognition (NER) is a use-ful component in Natural Language Process-ing (NLP) applications. It is used in varioustasks such as Machine Translation, Summa-rization, Information Retrieval, and Question-Answering systems. The research on NER iscentered around English and some other ma-jor languages, whereas limited attention hasbeen given to Indian languages. We analyze thechallenges and propose techniques that can betailored for Multilingual Named Entity Recog-nition for Indian Languages. We present a hu-man annotated named entity corpora of ∼40Ksentences for 4 Indian languages from two ofthe major Indian language families. Addition-ally, we show the transfer learning capabilitiesof pre-trained transformer models from a highresource language to multiple low resource lan-guages through a series of experiments. Wealso present a multilingual model fine-tunedon our dataset, which achieves an F1 score of∼0.80 on our dataset on average. We achievecomparable performance on completely unseenbenchmark datasets for Indian languages whichaffirms the usability of our model.

Anthology ID:: 2024.naacl-srw.9
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 75–82
Language:
URL:: https://aclanthology.org/2024.naacl-srw.9/
DOI:: 10.18653/v1/2024.naacl-srw.9
Bibkey:
Cite (ACL):: Sankalp Bahad, Pruthwik Mishra, Parameswari Krishnamurthy, and Dipti Sharma. 2024. Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 75–82, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages (Bahad et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-srw.9.pdf
Video:: https://aclanthology.org/2024.naacl-srw.9.mp4

PDF Cite Search Video Fix data