Parts of Speech (PoS) and Universal Parts of Speech (UPoS) Tagging: A Critical Review with Special Reference to Low Resource Languages

Talukdar Kuwali, Sarma Shikhar Kumar, Bhuyan Manash Pratim


Abstract
Universal Parts of Speech (UPoS) tags are parts of speech annotations used in Universal Dependencies. Universal Dependency (UD) helps in developing cross-linguistically consistent treebank annotations for multiple languages with a common framework and standard. For various Natural Language Processing (NLP) tasks and research such as semantic parsing, syntactic parsing as well as linguistic parsing, UD treebanks are becoming increasingly important resources. A lot of interest has been seen in adopting UD and UPoS standards and resources for integrating with various NLP techniques, including Machine Translations, Question Answering, Sentiment Analysis etc. Consequently, a wide variety of Artificial Intelligence (AI) and NLP tools are being created with UD and UPoS standards on board. Part of Speech (PoS) tagging is one of the fundamental NLP tasks, which labels a specific sentence or set of words in a paragraph with lexical and grammatical annotations, based on the context of the sentence. Contemporary Machine Learning (ML) and Deep Learning (DL) techniques require god quality tagged resources for training potential tagger models. Low resource languages face serious challenges here. This paper discusses about the UPoS in UD and presents a concise yet inclusive piece of literature regarding UPoS, PoS, and various taggers for multiple languages with special reference to various low resource languages. Already adopted approaches and models developed for different low resource languages are included in this review, considering representations from a wide variety of languages. Also, the study offers a comprehensive classification based on the well-known ML and DL techniques used in the development of part-of-speech taggers. This will serve as a ready-reference for understanding nuances of PoS and UPoS tagging.
Anthology ID:
2023.icon-1.70
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
703–713
Language:
URL:
https://aclanthology.org/2023.icon-1.70
DOI:
Bibkey:
Cite (ACL):
Talukdar Kuwali, Sarma Shikhar Kumar, and Bhuyan Manash Pratim. 2023. Parts of Speech (PoS) and Universal Parts of Speech (UPoS) Tagging: A Critical Review with Special Reference to Low Resource Languages. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 703–713, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
Parts of Speech (PoS) and Universal Parts of Speech (UPoS) Tagging: A Critical Review with Special Reference to Low Resource Languages (Kuwali et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.70.pdf