PoS to UPoS Conversion and Creation of UPoS Tagged Resources for Assamese Language

Talukdar Kuwali, Sarma Shikhar Kumar


Abstract
This paper addresses the vital task of transitioning from traditional Part-of-Speech (PoS) tagging to Universal Part-of-Speech (UPoS) tagging within the linguistic context of the Assamese language. The paper outlines a comprehensive methodology for PoS to UPoS conversion and the creation of UPoS tagged resources, bridging the gap between localized linguistic analysis and universal standards. The significance of this work lies in its potential to enhance natural language processing and understanding for the Assamese language, contributing to broader multilingual applications. The paper details the data preparation and creation processes, annotation methods, and evaluation techniques, shedding light on the challenges and opportunities presented in the pursuit of linguistic universality. The contents of this research have implications for improving language technology in the Assamese language and can serve as a model for similar work in other regional languages. Mapping of standard PoS tagset applicable for Indian languages to that of the primary categories of the UPoS tagset is done with respect to the Assamese language lexical behaviour. Conversion of PoS tagged text corpus to UPoS taged corpus using this mapping, and then utilizing a Deep Learning based model trained on such a dataset to create a sizable UPoS tagged corpus, are presented in a structured flow. This paper is a step towards a more standardized, universal understanding of linguistic elements in a diverse and multilingual world.
Anthology ID:
2023.icon-1.38
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
450–459
Language:
URL:
https://aclanthology.org/2023.icon-1.38
DOI:
Bibkey:
Cite (ACL):
Talukdar Kuwali and Sarma Shikhar Kumar. 2023. PoS to UPoS Conversion and Creation of UPoS Tagged Resources for Assamese Language. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 450–459, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
PoS to UPoS Conversion and Creation of UPoS Tagged Resources for Assamese Language (Kuwali & Shikhar Kumar, ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.38.pdf