Parth Patwa


2023

pdf bib
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Mohsin Mohammed | Sai Kandukuri | Neeharika Gupta | Parth Patwa | Anubhab Chatterjee | Vinija Jain | Aman Chadha | Amitava Das
Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching

The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2 -> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to SPs in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results.We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation.

2021

pdf bib
Image2tweet: Datasets in Hindi and English for Generating Tweets from Images
Rishabh Jha | Varshith Kaki | Varuna Kolla | Shubham Bhagat | Parth Patwa | Amitava Das | Santanu Pal
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Image Captioning as a task that has seen major updates over time. In recent methods, visual-linguistic grounding of the image-text pair is leveraged. This includes either generating the textual description of the objects and entities present within the image in constrained manner, or generating detailed description of these entities as a paragraph. But there is still a long way to go towards being able to generate text that is not only semantically richer, but also contains real world knowledge in it. This is the motivation behind exploring image2tweet generation through the lens of existing image-captioning approaches. At the same time, there is little research in image captioning in Indian languages like Hindi. In this paper, we release Hindi and English datasets for the task of tweet generation given an image. The aim is to generate a specialized text like a tweet, that is not a direct result of visual-linguistic grounding that is usually leveraged in similar tasks, but conveys a message that factors-in not only the visual content of the image, but also additional real world contextual information associated with the event described within the image as closely as possible. Further, We provide baseline DL models on our data and invite researchers to build more sophisticated systems for the problem.

pdf bib
Bitions@DravidianLangTech-EACL2021: Ensemble of Multilingual Language Models with Pseudo Labeling for offence Detection in Dravidian Languages
Debapriya Tula | Prathyush Potluri | Shreyas Ms | Sumanth Doddapaneni | Pranjal Sahu | Rohan Sukumaran | Parth Patwa
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

With the advent of social media, we have seen a proliferation of data and public discourse. Unfortunately, this includes offensive content as well. The problem is exacerbated due to the sheer number of languages spoken on these platforms and the multiple other modalities used for sharing offensive content (images, gifs, videos and more). In this paper, we propose a multilingual ensemble-based model that can identify offensive content targeted against an individual (or group) in low resource Dravidian language. Our model is able to handle code-mixed data as well as instances where the script used is mixed (for instance, Tamil and Latin). Our solution ranked number one for the Malayalam dataset and ranked 4th and 5th for Tamil and Kannada, respectively.

2020

pdf bib
Hater-O-Genius Aggression Classification using Capsule Networks
Parth Patwa | Srinivas Pykl | Amitava Das | Prerana Mukherjee | Viswanath Pulabaigari
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Contending hate speech in social media is one of the most challenging social problems of our time. There are various types of anti-social behavior in social media. Foremost of them is aggressive behavior, which is causing many social issues such as affecting the social lives and mental health of social media users. In this paper, we propose an end-to-end ensemble-based architecture to automatically identify and classify aggressive tweets. Tweets are classified into three categories - Covertly Aggressive, Overtly Aggressive, and Non-Aggressive. The proposed architecture is an ensemble of smaller subnetworks that are able to characterize the feature embeddings effectively. We demonstrate qualitatively that each of the smaller subnetworks is able to learn unique features. Our best model is an ensemble of Capsule Networks and results in a 65.2% F1 score on the Facebook test set, which results in a performance gain of 0.95% over the TRAC-2018 winners. The code and the model weights are publicly available at https://github.com/parthpatwa/Hater-O-Genius-Aggression-Classification-using-Capsule-Networks.

pdf bib
Aggression and Misogyny Detection using BERT: A Multi-Task Approach
Niloofar Safi Samghabadi | Parth Patwa | Srinivas PYKL | Prerana Mukherjee | Amitava Das | Thamar Solorio
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

In recent times, the focus of the NLP community has increased towards offensive language, aggression, and hate-speech detection. This paper presents our system for TRAC-2 shared task on “Aggression Identification” (sub-task A) and “Misogynistic Aggression Identification” (sub-task B). The data for this shared task is provided in three different languages - English, Hindi, and Bengali. Each data instance is annotated into one of the three aggression classes - Not Aggressive, Covertly Aggressive, Overtly Aggressive, as well as one of the two misogyny classes - Gendered and Non-Gendered. We propose an end-to-end neural model using attention on top of BERT that incorporates a multi-task learning paradigm to address both the sub-tasks simultaneously. Our team, “na14”, scored 0.8579 weighted F1-measure on the English sub-task B and secured 3rd rank out of 15 teams for the task. The code and the model weights are publicly available at https://github.com/NiloofarSafi/TRAC-2. Keywords: Aggression, Misogyny, Abusive Language, Hate-Speech Detection, BERT, NLP, Neural Networks, Social Media

pdf bib
SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Parth Patwa | Gustavo Aguilar | Sudipta Kar | Suraj Pandey | Srinivas PYKL | Björn Gambäck | Tanmoy Chakraborty | Thamar Solorio | Amitava Das
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi-English)and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels. These corpora are comprised of 20K and 19K examples, respectively. The sentiment labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions in total including 61 teams that participated in the Hinglish contest and 28 submitted systems to the Spanglish competition. The best performance achieved was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that BERT-like models and ensemble methods are the most common and successful approaches among the participants.