Development of Assamese Rule based Stemmer using WordNet

Jumi Sarmah, Shikhar Kumar Sarma, Anup Kumar Barman


Abstract
Stemming is a technique that reduces any inflected word to its root form. Assamese is a morphologically rich, scheduled Indian language. There are various forms of suffixes applied to a word in various contexts. Such inflected words if normalized will help improve the performance of various Natural Language Processing applications. This paper basically tries to develop a Look-up and rule-based suffix stripping approach for the Assamese language using WordNet. The authors prepare the dictionary with the root words extracted from Assamese WordNet and Named Entities. Appropriate stemming rules for the inflected nouns, verbs have been set to the rule engine and later tested the stemmed output with the morphological root words of Assamese WordNet and Named Entities by computing hamming distance. This developed stemmer for the Assamese language achieves accuracy of 85%. Also, the authors reported the IR system’s performance on applying the Assamese stemmer and proved its efficiency by retrieving sense oriented results based on the fired query. Thus, Morphological Analyzer will embark the research wing for developing various Assamese NLP applications.
Anthology ID:
2019.gwc-1.17
Volume:
Proceedings of the 10th Global Wordnet Conference
Month:
July
Year:
2019
Address:
Wroclaw, Poland
Editors:
Piek Vossen, Christiane Fellbaum
Venue:
GWC
SIG:
SIGLEX
Publisher:
Global Wordnet Association
Note:
Pages:
135–139
Language:
URL:
https://aclanthology.org/2019.gwc-1.17
DOI:
Bibkey:
Cite (ACL):
Jumi Sarmah, Shikhar Kumar Sarma, and Anup Kumar Barman. 2019. Development of Assamese Rule based Stemmer using WordNet. In Proceedings of the 10th Global Wordnet Conference, pages 135–139, Wroclaw, Poland. Global Wordnet Association.
Cite (Informal):
Development of Assamese Rule based Stemmer using WordNet (Sarmah et al., GWC 2019)
Copy Citation:
PDF:
https://aclanthology.org/2019.gwc-1.17.pdf