Jyoti Pawar

Also published as: Jyoti D. Pawar, Jyoti D Pawar


2024

pdf bib
Identification of Idiomatic Expressions in Konkani Language Using Neural Networks
Naziya Mahamdul Shaikh | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

The task of multi-word expressions identification and processing has posed a remarkable challenge to the natural language processing applications. One related subtask in this arena is correct labelling of the sentences with the presence of idiomatic expressions as either literal or idiomatic sense. The regional Indian language Konkani spoken in the states located in the west coast of India lacks in the research in idiom processing tasks. We aim at bridging this gap through a contribution to idiom identification method in Konkani language. This paper classifies the idiomatic expression usage in Konkani language as idiomatic or literal usage using a neural network-based setup. The developed system was able to successfully perform the identification task with an accuracy of 79.5 % and F1-score of 0.77.

pdf bib
Konkani Wordnet Visualizer as a Concept Teaching-Learning Tool
Sunayana R. Gawde | Jayram Ulhas Gawas | Shrikrishna R. Parab | Shilpa Neenad Desai | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

The Visualizer is a tree-structure designed to browse and explore the Konkani WordNet lexical database. We propose to utilise this tool as a concept teaching and learning resource for Konkani, to be used by both teachers and students. It can also be used to add the missing semantic and lexical relations, thus enhancing the wordnet. It extracts related concepts for a given word and displays them as a sub-tree. The interface includes various features to offer users greater flexibility in navigating and understanding the word relationships. We attempted to enrich the Konkani Wordnet qualitatively with a Visualizer that offers an improved usability and is incorporated in the Konkani Wordnet website for the public use. The Visualizer is designed to provide graphical representations of words and their semantic relationships, making it easier to explore connections and meanings within the lexical database.

pdf bib
Pronominal Anaphora Resolution in Konkani language incorporating Gender Agreement
Poonam A. Navelker | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Konkani is a low-resource language, spoken mainly on the central west coast of India. Approximately 2.3 million people speak Konkani (Office of the Registrar General Census Commissioner, India,2011). It is also the official language of the state of Goa. It belongs to the Southern Indo-Aryan language group. The official Script for writing the Konkani language is Devanagari. Despite this, being a low-resource language has hampered its development on the digital platform, Konkani has yet to significantly impact its digital presence. To improve this situation, contribution to Natural Language Understanding in the Konkani language is important. This paper aims to resolve pronominal anaphora in the Konkani language using a rule-based method incorporating gender agreement. This is required in NLP applications like text summarization, machine translation, and question-answering systems. While research on English and other foreign languages, as well as Indian languages like Tamil, Kannada, Malayalam, Bengali, and Marathi, have been done, no work has been done on the Konkani language thus far. This is the very first attempt made to resolve anaphora in Konkani.

pdf bib
Shabdocchar: Konkani WordNet Enrichment with Audio Feature
Sunayana R. Gawde | Shrikrishna R. Parab | Jayram Ulhas Gawas | Shilpa Neenad Desai | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Konkani WordNet, also called Konkani Shabdamalem, was created as part of the Indradhanush WordNet Project Consortium between August 2010 and October 2013. Currently, the Konkani WordNet includes about 32,370 synsets and 37,719 unique words. There is a need to enhance the Konkani WordNet both quantitatively as well as qualitatively. In this paper we are presenting a Game-Based Crowdsourcing approach adopted by us to add audio feature to the Konkani WordNet which has resulted in an increase in the number of users using and getting exposed to the capabilities of the Konkani WordNet to aid in the Konkani language teaching-learning process as well as for creation of resources to initiate further research. Our work presented here has resulted in the creation of an audio corpus of 37,719 unique words which we have named as ‘Shabdocchar’ within a short time span of four months covering five dialects of Konkani. We are confident that Shabdocchar will prove to be a very useful resource to support future research work on Dialects of Konkani and support voice-based search of words in the wordnet. This approach can be adopted to enhance other wordnets as well.

pdf bib
Aspect-based Summaries from Online Product Reviews: A Comparative Study using various LLMs
Pratik Deelip Korkankar | Alvyn Abranches | Pradnya Bhagat | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

In the era of online shopping, the volume of product reviews for user products on e-commerce platforms is massively increasing on a daily basis. For any given user product, it consists of a flood of reviews and manually analysing each of these reviews to understand the important aspects or opinions associated with the products is difficult and time-consuming task. Furthermore, it becomes nearly impossible for the customer to make decision of buying the product or not. Thus, it becomes necessary to have an aspect-based summary generated from these user reviews, which can act as a guide for the interested buyer in decision-making. Recently, the use of Large Language Models (LLMs) has shown great potential for solving diverse Natural Language Processing (NLP) tasks, including the task of summarization. Our paper explores the use of various LLMs such as Llama3, GPT-4o, Gemma2, Mistral, Mixtral and Qwen2 on the publicly available domain-specific Amazon reviews dataset as a part of our experimentation work. Our study postulates an algorithm to accurately identify product aspects and the model’s ability to extract relevant information and generate concise summaries. Further, we analyzed the experimental results of each of these LLMs with summary evaluation metrics such as Rouge, Meteor, BERTScore F1 and GPT-4o to evaluate the quality of the generated aspect-based summary. Our study highlights the strengths and limitations of each of these LLMs, thereby giving valuable insights for guiding researchers in harnessing LLMs for generating aspect-based summaries of user products present on these online shopping platforms.

pdf bib
Sentiment Analysis for Konkani using Zero-Shot Marathi Trained Neural Network Model
Rohit M. Ghosarwadkar | Seamus Fred Rodrigues | Pradnya Bhagat | Alvyn Abranches | Pratik Deelip Korkankar | Jyoti Pawar
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Sentiment Analysis plays a crucial role in understanding user opinions in various languages. The paper presents an experiment with a sentiment analysis model fine-tuned on Marathi sentences to classify sentiments into positive, negative, and neutral categories. The fine-tuned model shows high accuracy when tested on Konkani sentences, despite not being explicitly trained on Konkani data; since Marathi is a language very close to Konkani. This outcome highlights the effectiveness of Zero-shot learning, where the model generalizes well across linguistically similar languages. Evaluation metrics such as accuracy, balanced accuracy, negative accuracy, neutral accuracy, positive accuracy and confusion matrix scores were used to assess the performance, with Konkani sentences demonstrating superior results. These findings indicate that zero-shot sentiment analysis can be a powerful tool for sentiment classification in resource poor languages like Konkani, where labeled data is limited. The method can be used to generate datasets for resource-poor languages. Furthermore, this suggests that leveraging linguistically similar languages can help generate datasets for low-resource languages, enhancing sentiment analysis capabilities where labeled data is scarce. By utilizing related languages, zero-shot models can achieve meaningful performance without the need for extensive labeled data for the target language.

pdf bib
Konidioms Corpus: A Dataset of Idioms in Konkani Language
Naziya Mahamdul Shaikh | Jyoti D. Pawar | Mubarak Banu Sayed
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Konkani is a language spoken by a large number of people from the states located in the west coast of India. It is the official language of Goa state from the Indian subcontinent. Currently there is a lack of idioms corpus in the low-resource Konkani language. This paper aims to improve the progress in idiomatic sentence identification in order to enhance linguistic processing by creating the first corpus for idioms in the Konkani language. We select a unique list of 1597 idioms from multiple sources and proceed with a strictly controlled sentence creation procedure through crowdsourcing. This is followed by quality check of the sentences and annotation procedure by the experts in the Konkani language. We were able to build a good quality corpus comprising of 6520 sentences written in the Devanagari script of Konkani language. Analysis of the collected idioms and their usage in the created sentences revealed the dominance of selective domains like ‘human body’ in the creation and occurrences of idiomatic expressions in the Konkani language. This corpus is made publicly available.

2017

pdf bib
IITP at EmoInt-2017: Measuring Intensity of Emotions using Sentence Embeddings and Optimized Features
Md Shad Akhtar | Palaash Sawant | Asif Ekbal | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper describes the system that we submitted as part of our participation in the shared task on Emotion Intensity (EmoInt-2017). We propose a Long short term memory (LSTM) based architecture cascaded with Support Vector Regressor (SVR) for intensity prediction. We also employ Particle Swarm Optimization (PSO) based feature selection algorithm for obtaining an optimized feature set for training and evaluation. System evaluation shows interesting results on the four emotion datasets i.e. anger, fear, joy and sadness. In comparison to the other participating teams our system was ranked 5th in the competition.

2016

pdf bib
IndoWordNet Conversion to Web Ontology Language (OWL)
Apurva Nagvenkar | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

WordNet plays a significant role in Linked Open Data (LOD) cloud. It has numerous application ranging from ontology annotation to ontology mapping. IndoWordNet is a linked WordNet connecting 18 Indian language WordNets with Hindi as a source WordNet. The Hindi WordNet was initially developed by linking it to English WordNet. In this paper, we present a data representation of IndoWordNet in Web Ontology Language (OWL). The schema of Princeton WordNet has been enhanced to support the representation of IndoWordNet. This IndoWordNet representation in OWL format is now available to link other web resources. This representation is implemented for eight Indian languages.

pdf bib
Use of Semantic Knowledge Base for Enhancement of Coherence of Code-mixed Topic-Based Aspect Clusters
Kavita Asnani | Jyoti D Pawar
Proceedings of the 13th International Conference on Natural Language Processing

2015

pdf bib
Let Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi
Apurva Nagvenkar | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Logistic Regression for Automatic Lexical Level Morphological Paradigm Selection for Konkani Nouns
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

2014

pdf bib
Concept Space Synset Manager Tool
Apurva Nagvenkar | Neha Prabhugaonkar | Venkatesh Prabhu | Ramdas Karmali | Jyoti Pawar
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Use of Sense Marking for Improving WordNet Coverage
Neha Prabhugaonkar | Jyoti Pawar
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Proceedings of the 11th International Conference on Natural Language Processing
Dipti Misra Sharma | Rajeev Sangal | Jyoti D. Pawar
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
AutoParSe: An Automatic Paradigm Selector For Nouns in Konkani
Shilpa Desai | Neenad Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
A Framework for Learning Morphology using Suffix Association Matrix
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

2012

pdf bib
Automated Paradigm Selection for FSA based Konkani Verb Morphological Analyzer
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

pdf bib
BIS Annotation Standards With Reference to Konkani Language
Madhavi Sardesai | Jyoti Pawar | Shantaram Walawalikar | Edna Vaz
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing