Mandal Anupam


2023

pdf bib
Neural language model embeddings for Named Entity Recognition: A study from language perspective
Maurya Muskaan | Mandal Anupam | Maurya Manoj | Gupta Naval | Nayak Somya
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Named entity recognition (NER) models based on neural language models (LMs) exhibit stateof-the-art performance. However, the performance of such LMs have not been studied in detail with respect to finer language related aspects in the context of NER tasks. Such a study will be helpful in effective application of these models for cross-lingual and multilingual NER tasks. In this study, we examine the effects of script, vocabulary sharing, foreign names and pooling of multilanguage training data for building NER models. It is observed that monolingual BERT embeddings show the highest recognition accuracy among all transformerbased LMs for monolingual NER models. It is also seen that vocabulary sharing and data augmentation with foreign named entities (NEs) are most effective towards improving accuracy of cross-lingual NER models. Multilingual NER models trained by pooling data from similar languages can address training data inadequacy and exhibit performance close to that of monolingual models trained with adequate NER-tagged data of a single language.