Enhancing Telugu Part-of-Speech Tagging with Deep Sequential Models and Multilingual Embeddings

Sai Rishith Reddy Mangamuru; Sai Prashanth Karnati; Bala Karthikeya Sajja; Divith Phogat; Premjith B

Enhancing Telugu Part-of-Speech Tagging with Deep Sequential Models and Multilingual Embeddings

Sai Rishith Reddy Mangamuru, Sai Prashanth Karnati, Bala Karthikeya Sajja, Divith Phogat, Premjith B.

Abstract

Part-of-speech (POS) tagging is a fundamental task in natural language processing (NLP) that involves assigning grammatical categories to words in a sentence. In this study, we investigate the application of deep sequential models for POS tagging of Telugu, a low-resource Dravidian language with rich morphology. We use the Universal dependencies dataset for this research and explore various deep learning architectures, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), and their stacked variants for POS tagging. Additionally, we utilize multilingual BERT embeddings and indicBERT embeddings to capture contextual information from the input sequences. Our experiments demonstrate that stacked LSTM with multilingual BERT embeddings achieves the highest performance, outperforming other approaches and attaining an F1 score of 0.8812. These findings suggest that deep sequential models, particularly stacked LSTMs with multilingual BERT embeddings, are effective tools for POS tagging in Telugu.

Anthology ID:: 2023.icon-1.77
Volume:: Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2023
Address:: Goa University, Goa, India
Editors:: Jyoti D. Pawar, Sobha Lalitha Devi
Venue:: ICON
SIG:: SIGLEX
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 760–765
Language:
URL:: https://aclanthology.org/2023.icon-1.77/
DOI:
Bibkey:
Cite (ACL):: Sai Rishith Reddy Mangamuru, Sai Prashanth Karnati, Bala Karthikeya Sajja, Divith Phogat, and Premjith B.. 2023. Enhancing Telugu Part-of-Speech Tagging with Deep Sequential Models and Multilingual Embeddings. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 760–765, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):: Enhancing Telugu Part-of-Speech Tagging with Deep Sequential Models and Multilingual Embeddings (Mangamuru et al., ICON 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.icon-1.77.pdf

PDF Cite Search Fix data