Arindam Chatterjee


2022

pdf bib
PACMAN:PArallel CodeMixed dAta generatioN for POS tagging
Arindam Chatterjee | Chhavi Sharma | Ayush Raj | Asif Ekbal
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

Code-mixing or Code-switching is the mixing of languages in the same context, predominantly observed in multilingual societies. The existing code-mixed datasets are small and primarily contain social media text that does not adhere to standard spelling and grammar. Computational models built on such data fail to generalise on unseen code-mixed data. To address the unavailability of quality code-mixed annotated datasets, we explore the combined task of generating annotated code mixed data, and building computational models from this generated data, specifically for code-mixed Part-Of-Speech (POS) tagging. We introduce PACMAN(PArallel CodeMixed dAta generatioN) - a synthetically generated code-mixed POS tagged dataset, with above 50K samples, which is the largest annotated code-mixed dataset. We build POS taggers using classical machine learning and deep learning based techniques on the generated data to report an F1-score of 98% (8% above current State-of-the-art (SOTA)). To determine the efficacy of our data, we compare it against the existing benchmark in code-mixed POS tagging. PACMAN outperforms the benchmark, ratifying that our dataset and, subsequently, our POS tagging models are generalised and capable of handling even natural code-mixed and monolingual data.

2021

pdf bib
Towards Explainable Dialogue System: Explaining Intent Classification using Saliency Techniques
Ratnesh Joshi | Arindam Chatterjee | Asif Ekbal
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Deep learning based methods have shown tremendous success in several Natural Language Processing (NLP) tasks. The recent trends in the usage of Deep Learning based models for natural language tasks have definitely produced incredible performance for several application areas. However, one major problem that most of these models face is the lack of transparency, i.e. the actual decision process of the underlying model is not explainable. In this paper, at first we solve a very fundamental problem of Natural Language Understanding (NLU), i.e. intent detection using a Bi-directional Long Short Term Memory (BiLSTM). In order to determine the defining features that lead to a specific intent class, we use the Layerwise Relevance Propagation (LRP) algorithm to find the defining feature(s). In the process, we conclude that saliency method of eLRP (epsilon Layerwise Relevance Propagation) is a prominent process for highlighting the important features of the input responsible for the current classification which results in significant insights to the inner workings, such as the reasons for misclassification by the black box model.

2012

pdf bib
Eating Your Own Cooking: Automatically Linking Wordnet Synsets of Two Languages
Salil Joshi | Arindam Chatterjee | Arun Karthikeyan Karra | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Discrimination-Net for Hindi
Diptesh Kanojia | Arindam Chatterjee | Salil Joshi | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
Together We Can: Bilingual Bootstrapping for WSD
Mitesh M. Khapra | Salil Joshi | Arindam Chatterjee | Pushpak Bhattacharyya
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies