Divyanshu Aggarwal


2023

pdf bib
Evaluating Inter-Bilingual Semantic Parsing for Indian Languages
Divyanshu Aggarwal | Vivek Gupta | Anoop Kunchukuttan
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)

Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity of the logical form, which makes English to multilingual translation difficult. The process involves alignment of logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual Seq2seq Semantic parsing dataset IE-SemParse Suite for 11 distinct Indian languages. We highlight the proposed task’s practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP, multilingual TOP and multiATIS++) and our proposed IE-SemParse suite.

2022

pdf bib
XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference
Bhavnick Minhas | Anant Shankhdhar | Vivek Gupta | Divyanshu Aggarwal | Shuo Zhang
Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER)

The ability to reason about tabular or semi-structured knowledge is a fundamental problem for today’s Natural Language Processing (NLP) systems. While significant progress has been achieved in the direction of tabular reasoning, these advances are limited to English due to the absence of multilingual benchmark datasets for semi-structured data. In this paper, we use machine translation methods to construct a multilingual tabular NLI dataset, namely XINFOTABS, which expands the English tabular NLI dataset of INFOTABS to ten diverse languages. We also present several baselines for multilingual tabular reasoning, e.g., machine translation-based methods and cross-lingual. We discover that the XINFOTABS evaluation suite is both practical and challenging. As a result, this dataset will contribute to increased linguistic inclusion in tabular reasoning research and applications.

pdf bib
IndicXNLI: Evaluating Multilingual Inference for Indian Languages
Divyanshu Aggarwal | Vivek Gupta | Anoop Kunchukuttan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited. To this end, we introduce INDICXNLI, an NLI dataset for 11 Indic languages. It has been created by high-quality machine translation of the original English XNLI dataset and our analysis attests to the quality of INDICXNLI. By finetuning different pre-trained LMs on this INDICXNLI, we analyze various cross-lingual transfer techniques with respect to the impact of the choice of language models, languages, multi-linguality, mix-language input, etc. These experiments provide us with useful insights into the behaviour of pre-trained models for a diverse set of languages.

2021

pdf bib
Fine-tuning Distributional Semantic Models for Closely-Related Languages
Kushagra Bhatia | Divyanshu Aggarwal | Ashwini Vaidya
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

In this paper we compare the performance of three models: SGNS (skip-gram negative sampling) and augmented versions of SVD (singular value decomposition) and PPMI (Positive Pointwise Mutual Information) on a word similarity task. We particularly focus on the role of hyperparameter tuning for Hindi based on recommendations made in previous work (on English). Our results show that there are language specific preferences for these hyperparameters. We extend the best settings for Hindi to a set of related languages: Punjabi, Gujarati and Marathi with favourable results. We also find that a suitably tuned SVD model outperforms SGNS for most of our languages and is also more robust in a low-resource setting.