Nilesh Joshi
2020
Part-of-Speech Annotation Challenges in Marathi
Gajanan Rane | Nilesh Joshi | Geetanjali Rane | Hanumant Redkar | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Gajanan Rane | Nilesh Joshi | Geetanjali Rane | Hanumant Redkar | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Part of Speech (POS) annotation is a significant challenge in natural language processing. The paper discusses issues and challenges faced in the process of POS annotation of the Marathi data from four domains viz., tourism, health, entertainment and agriculture. During POS annotation, a lot of issues were encountered. Some of the major ones are discussed in detail in this paper. Also, the two approaches viz., the lexical (L approach) and the functional (F approach) of POS tagging have been discussed and presented with examples. Further, some ambiguous cases in POS annotation are presented in the paper.
2019
Introduction to Sanskrit Shabdamitra: An Educational Application of Sanskrit Wordnet
Malhar Kulkarni | Nilesh Joshi | Sayali Khare | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium
Malhar Kulkarni | Nilesh Joshi | Sayali Khare | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium
2016
Samāsa-Kartā: An Online Tool for Producing Compound Words using IndoWordNet
Hanumant Redkar | Nilesh Joshi | Sandhya Singh | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)
Hanumant Redkar | Nilesh Joshi | Sandhya Singh | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)
Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The meaning of this word is derived from each of the individual words of the compound. To develop a system to generate, identify and interpret compounds, is an important task in Natural Language Processing. This paper introduces a web based tool - Samāsa-Kartā for producing compound words. Here, the focus is on Sanskrit language due to its richness in usage of compounds; however, this approach can be applied to any Indian language as well as other languages. IndoWordNet is used as a resource for words to be compounded. The motivation behind creating compound words is to create, to improve the vocabulary, to reduce sense ambiguity, etc. in order to enrich the WordNet. The Samāsa-Kartā can be used for various applications viz., compound categorization, sandhi creation, morphological analysis, paraphrasing, synset creation, etc.
Verbframator:Semi-Automatic Verb Frame Annotator Tool with Special Reference to Marathi
Hanumant Redkar | Sandhya Singh | Nandini Ghag | Jai Paranjape | Nilesh Joshi | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing
Hanumant Redkar | Sandhya Singh | Nandini Ghag | Jai Paranjape | Nilesh Joshi | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing
2015
IndoWordNet Dictionary: An Online Multilingual Dictionary using IndoWordNet
Hanumant Redkar | Sandhya Singh | Nilesh Joshi | Anupam Ghosh | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing
Hanumant Redkar | Sandhya Singh | Nilesh Joshi | Anupam Ghosh | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing