Tathagata Raha


2022

pdf bib
IIITH at SemEval-2022 Task 5: A comparative study of deep learning models for identifying misogynous memes
Tathagata Raha | Sagar Joshi | Vasudeva Varma
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper provides a comparison of different deep learning methods for identifying misogynous memes for SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification. In this task, we experiment with architectures in the identification of misogynous content in memes by making use of text and image-based information. The different deep learning methods compared in this paper are: (i) unimodal image or text models (ii) fusion of unimodal models (iii) multimodal transformers models and (iv) transformers further pretrained on a multimodal task. From our experiments, we found pretrained multimodal transformer architectures to strongly outperform the models involving the fusion of representation from both the modalities.

pdf bib
Towards Capturing Changes in Mood and Identifying Suicidality Risk
Sravani Boinepelli | Shivansh Subramanian | Abhijeeth Singam | Tathagata Raha | Vasudeva Varma
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

This paper describes our systems for CLPsych?s 2022 Shared Task. Subtask A involves capturing moments of change in an individual?s mood over time, while Subtask B asked us to identify the suicidality risk of a user. We explore multiple machine learning and deep learning methods for the same, taking real-life applicability into account while considering the design of the architecture. Our team achieved top results in different categories for both subtasks. Task A was evaluated on a post-level (using macro averaged F1) and on a window-based timeline level (using macro-averaged precision and recall). We scored a post-level F1 of 0.520 and ranked second with a timeline-level recall of 0.646. Task B was a user-level task where we also came in second with a micro F1 of 0.520 and scored third place on the leaderboard with a macro F1 of 0.380.

2021

pdf bib
IIITH at SemEval-2021 Task 7: Leveraging transformer-based humourous and offensive text detection architectures using lexical and hurtlex features and task adaptive pretraining
Tathagata Raha | Ishan Sanjeev Upadhyay | Radhika Mamidi | Vasudeva Varma
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our approach (IIITH) for SemEval-2021 Task 5: HaHackathon: Detecting and Rating Humor and Offense. Our results focus on two major objectives: (i) Effect of task adaptive pretraining on the performance of transformer based models (ii) How does lexical and hurtlex features help in quantifying humour and offense. In this paper, we provide a detailed description of our approach along with comparisions mentioned above.

2019

pdf bib
Development of POS tagger for English-Bengali Code-Mixed data
Tathagata Raha | Sainik Mahata | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 16th International Conference on Natural Language Processing

Code-mixed texts are widespread nowadays due to the advent of social media. Since these texts combine two languages to formulate a sentence, it gives rise to various research problems related to Natural Language Processing. In this paper, we try to excavate one such problem, namely, Parts of Speech tagging of code-mixed texts. We have built a system that can POS tag English-Bengali code-mixed data where the Bengali words were written in Roman script. Our approach initially involves the collection and cleaning of English-Bengali code-mixed tweets. These tweets were used as a development dataset for building our system. The proposed system is a modular approach that starts by tagging individual tokens with their respective languages and then passes them to different POS taggers, designed for different languages (English and Bengali, in our case). Tags given by the two systems are later joined together and the final result is then mapped to a universal POS tag set. Our system was checked using 100 manually POS tagged code-mixed sentences and it returned an accuracy of 75.29%.