Shriram Hegde


pdf bib
Parts of Speech Tagging for Kannada
Swaroop L R | Rakshith Gowda G S | Sourabh U | Shriram Hegde
Proceedings of the Student Research Workshop Associated with RANLP 2019

Parts of speech (POS) tagging is the process of assigning the part of speech tag to each and every word in a sentence. In this paper, we have presented POS tagger for Kannada, a low resource south Asian language, using Condition Random Fields. POS tagger developed in the work uses novel features native to Kannada language. The novel features include Sandhi splitting, where a compound word is broken down into two or more meaningful constituent words. The proposed model is trained and tested on the tagged dataset which contains 21 thousand sentences and achieves a highest accuracy of 94.56%.