NLP Tools for Khasi, a low resource language

Medari Tham


Abstract
Khasi is an Austro Asiatic language spoken by one of the tribes in Meghalaya, and parts of Assam and Bangladesh. The fact that some NLP tools for Khasi are now available online for testing purposes is the culmination of the arduous investment in time and effort. Initially when work for Khasi was initiated, resources for Khasi, such as tagset and annotated corpus or any NLP tools, were nonexistent. As part of the author’s ongoing work for her doctoral program, currently, the resources for Khasi that are in place are the BIS (Bureau of Indian Standards) tagset for Khasi, a 90k annotated corpus, and NLP tools such as POS (parts of speech) taggers and shallow parsers. These mentioned tools are highlighted in this demonstration paper.
Anthology ID:
2020.icon-demos.10
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Month:
DECEMBER
Year:
2020
Address:
Patna, India
Editors:
Vishal Goyal, Asif Ekbal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
26–27
Language:
URL:
https://aclanthology.org/2020.icon-demos.10
DOI:
Bibkey:
Cite (ACL):
Medari Tham. 2020. NLP Tools for Khasi, a low resource language. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations, pages 26–27, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
NLP Tools for Khasi, a low resource language (Tham, ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-demos.10.pdf