Morpheme boundary Detection & Grammatical feature Prediction for Gujarati : Dataset & Model

Jatayu Baxi, Brijesh Bhatt


Abstract
Developing Natural Language Processing resources for a low resource language is a challenging but essential task. In this paper, we present a Morphological Analyzer for Gujarati. We have used a Bi-Directional LSTM based approach to perform morpheme boundary detection and grammatical feature tagging. We have created a data set of Gujarati words with lemma and grammatical features. The Bi-LSTM based model of Morph Analyzer discussed in the paper handles the language morphology effectively without the knowledge of any hand-crafted suffix rules. To the best of our knowledge, this is the first dataset and morph analyzer model for the Gujarati language which performs both grammatical feature tagging and morpheme boundary detection tasks.
Anthology ID:
2021.icon-main.45
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
369–377
Language:
URL:
https://aclanthology.org/2021.icon-main.45
DOI:
Bibkey:
Cite (ACL):
Jatayu Baxi and Brijesh Bhatt. 2021. Morpheme boundary Detection & Grammatical feature Prediction for Gujarati : Dataset & Model. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 369–377, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Morpheme boundary Detection & Grammatical feature Prediction for Gujarati : Dataset & Model (Baxi & Bhatt, ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.45.pdf