Sandhya Singh

2022

Verb Phrase Anaphora (VPA) is a universal language phenomenon. It can occur in the form of do so phrase, verb phrase ellipsis, etc. Resolving VPA can improve the performance of Dialogue processing systems, Natural Language Generation (NLG), Question Answering (QA) and so on. In this paper, we present a novel computational approach to resolve the specific verb phrase anaphora appearing as do so construct and its lexical variations for the English language. The approach follows a heuristic technique using a combination of parsing from classical NLP, state-of-the-art (SOTA) Generative Pre-trained Transformer (GPT) language model and RoBERTa grammar correction model. The result indicates that our approach can resolve these specific verb phrase anaphora cases with 73.40 F1 score. The data set used for testing the specific verb phrase anaphora cases of do so and doing so is released for research purposes. This module has been used as the last module in a coreference resolution pipeline for a downstream QA task for the electronic home appliances sector.

Movies reflect society and also hold power to transform opinions. Social biases and stereotypes present in movies can cause extensive damage due to their reach. These biases are not always found to be the need of storyline but can creep in as the author’s bias. Movie production houses would prefer to ascertain that the bias present in a script is the story’s demand. Today, when deep learning models can give human-level accuracy in multiple tasks, having an AI solution to identify the biases present in the script at the writing stage can help them avoid the inconvenience of stalled release, lawsuits, etc. Since AI solutions are data intensive and there exists no domain specific data to address the problem of biases in scripts, we introduce a new dataset of movie scripts that are annotated for identity bias. The dataset contains dialogue turns annotated for (i) bias labels for seven categories, viz., gender, race/ethnicity, religion, age, occupation, LGBTQ, and other, which contains biases like body shaming, personality bias, etc. (ii) labels for sensitivity, stereotype, sentiment, emotion, emotion intensity, (iii) all labels annotated with context awareness, (iv) target groups and reason for bias labels and (v) expert-driven group-validation process for high quality annotations. We also report various baseline performances for bias identification and category detection on our dataset.

2018

This paper reports the work related to making Hindi Wordnet1 available as a digital resource for language learning and teaching, and the experiences and lessons that were learnt during the process. The language data of the Hindi Wordnet has been suitably modified and enhanced to make it into a language learning aid. This aid is based on modern pedagogical axioms and is aligned to the learning objectives of the syllabi of the school education in India. To make it into a comprehensive language tool, grammatical information has also been encoded, as far as these can be marked on the lexical items. The delivery of information is multi-layered, multi-sensory and is available across multiple digital platforms. The front end has been designed to offer an eye-catching user-friendly interface which is suitable for learners starting from age six onward. Preliminary testing of the tool has been done and it has been modified as per the feedbacks that were received. Above all, the entire exercise has offered gainful insights into learning based on associative networks and how knowledge based on such networks can be made available to modern learners.

pdf bib
Does Curriculum Learning help Deep Learning for Natural Language Generation?
Sandhya Singh | Kevin Patel | Pushpak Bhattacharya | Krishnanjan Bhattacharjee | Hemant Darbari | Seema Verma
Proceedings of the 15th International Conference on Natural Language Processing

2017

pdf bib abs
Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation
Sandhya Singh | Ritesh Panjwani | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we empirically compare the two encoder-decoder neural machine translation architectures: convolutional sequence to sequence model (ConvS2S) and recurrent sequence to sequence model (RNNS2S) for English-Hindi language pair as part of IIT Bombay’s submission to WAT2017 shared task. We report the results for both English-Hindi and Hindi-English direction of language pair.

pdf bib abs
Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Meenakshi Somasundaram | Dhara Gorasia | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

In today’s technology driven digital era, education domain is undergoing a transformation from traditional approaches to more learner controlled and flexible methods of learning. This transformation has opened the new avenues for interdisciplinary research in the field of educational technology and natural language processing in developing quality digital aids for learning and teaching. The tool presented here - Hindi Shabhadamitra, developed using Hindi Wordnet for Hindi language learning, is one such e-learning tool. It has been developed as a teaching and learning aid suitable for formal school based curriculum and informal setup for self learning users. Besides vocabulary, it also provides word based grammar along with images and pronunciation for better learning and retention. This aid demonstrates that how a rich lexical resource like wordnet can be systematically remodeled for practical usage in the educational domain.

pdf bib
Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Dhara Gorasia | Meenakshi Somasundaram | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The meaning of this word is derived from each of the individual words of the compound. To develop a system to generate, identify and interpret compounds, is an important task in Natural Language Processing. This paper introduces a web based tool - Samāsa-Kartā for producing compound words. Here, the focus is on Sanskrit language due to its richness in usage of compounds; however, this approach can be applied to any Indian language as well as other languages. IndoWordNet is used as a resource for words to be compounded. The motivation behind creating compound words is to create, to improve the vocabulary, to reduce sense ambiguity, etc. in order to enrich the WordNet. The Samāsa-Kartā can be used for various applications viz., compound categorization, sandhi creation, morphological analysis, paraphrasing, synset creation, etc.

pdf bib abs
IIT Bombay’s English-Indonesian submission at WAT: Integrating Neural Language Models with SMT
Sandhya Singh | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the IIT Bombay’s submission as a part of the shared task in WAT 2016 for English–Indonesian language pair. The results reported here are for both the direction of the language pair. Among the various approaches experimented, Operation Sequence Model (OSM) and Neural Language Model have been submitted for WAT. The OSM approach integrates translation and reordering process resulting in relatively improved translation. Similarly the neural experiment integrates Neural Language Model with Statistical Machine Translation (SMT) as a feature for translation. The Neural Probabilistic Language Model (NPLM) gave relatively high BLEU points for Indonesian to English translation system while the Neural Network Joint Model (NNJM) performed better for English to Indonesian direction of translation system. The results indicate improvement over the baseline Phrase-based SMT by 0.61 BLEU points for English-Indonesian system and 0.55 BLEU points for Indonesian-English translation system.

2015

pdf bib
IndoWordNet Dictionary: An Online Multilingual Dictionary using IndoWordNet
Hanumant Redkar | Sandhya Singh | Nilesh Joshi | Anupam Ghosh | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Co-authors

Venues

Fix author