Akshat Gupta


2022

pdf bib
On Building Spoken Language Understanding Systems for Low Resourced Languages
Akshat Gupta
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Spoken dialog systems are slowly becoming an integral part of the human experience due to their various advantages over textual interfaces. Spoken language understanding (SLU) systems are fundamental building blocks of spoken dialog systems. But creating SLU systems for low resourced languages is still a challenge. In a large number of low resourced language, we don’t have access to enough data to build automatic speech recognition (ASR) technologies, which are fundamental to any SLU system. Also, ASR based SLU systems do not generalize to unwritten languages. In this paper, we present a series of experiments to explore extremely low-resourced settings where we perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset. We also work in a low-resourced setting where we do not use language specific ASR systems to transcribe input speech, which compounds the challenge of building SLU systems to simulate a true low-resourced setting. We test our system on Belgian Dutch (Flemish) and English and find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features. Specifically, when using a phonetic transcription based system over a feature based system, we see average improvements of 12.37% and 13.08% for binary and four-class classification problems respectively, when averaged over 49 different experimental settings.

2021

pdf bib
Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data
Akshat Gupta | Sargam Menghani | Sai Krishna Rallabandi | Alan W Black
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - ‘Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?’. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.

pdf bib
Task-Specific Pre-Training and Cross Lingual Transfer for Sentiment Analysis in Dravidian Code-Switched Languages
Akshat Gupta | Sai Krishna Rallabandi | Alan W Black
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Sentiment analysis in Code-Mixed languages has garnered a lot of attention in recent years. It is an important task for social media monitoring and has many applications, as a large chunk of social media data is Code-Mixed. In this paper, we work on the problem of sentiment analysis for Dravidian Code-Switched languages - Tamil-Engish and Malayalam-English, using three different BERT based models. We leverage task-specific pre-training and cross-lingual transfer to improve on previously reported results, with significant improvement for the Tamil-Engish dataset. We also present a multilingual sentiment classification model that has competitive performance on both Tamil-English and Malayalam-English datasets.

pdf bib
SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification
Sai Muralidhar Jayanthi | Akshat Gupta
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.