Prathyush Potluri


2021

pdf bib
Bitions@DravidianLangTech-EACL2021: Ensemble of Multilingual Language Models with Pseudo Labeling for offence Detection in Dravidian Languages
Debapriya Tula | Prathyush Potluri | Shreyas Ms | Sumanth Doddapaneni | Pranjal Sahu | Rohan Sukumaran | Parth Patwa
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

With the advent of social media, we have seen a proliferation of data and public discourse. Unfortunately, this includes offensive content as well. The problem is exacerbated due to the sheer number of languages spoken on these platforms and the multiple other modalities used for sharing offensive content (images, gifs, videos and more). In this paper, we propose a multilingual ensemble-based model that can identify offensive content targeted against an individual (or group) in low resource Dravidian language. Our model is able to handle code-mixed data as well as instances where the script used is mixed (for instance, Tamil and Latin). Our solution ranked number one for the Malayalam dataset and ranked 4th and 5th for Tamil and Kannada, respectively.

2020

pdf bib
Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking
Rutvik Vijjali | Prathyush Potluri | Siddharth Kumar | Sundeep Teki
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user queries about particular COVID-19 claims. The second model verifies the level of “truth” in the queried claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.