Raj Pranesh


2022

pdf bib
Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets
Raj Pranesh
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)

Social media platforms, such as Twitter, often provide firsthand news during the outbreak of a crisis. It is extremely essential to process these facts quickly to plan the response efforts for minimal loss. Therefore, in this paper, we present an analysis of various multimodal feature fusion techniques to analyze and classify disaster tweets into multiple crisis events via transfer learning. In our study, we utilized three image models pre-trained on ImageNet dataset and three fine-tuned language models to learn the visual and textual features of the data and combine them to make predictions. We have presented a systematic analysis of multiple intra-modal and cross-modal fusion strategies and their effect on the performance of the multimodal disaster classification system. In our experiment, we used 8,242 disaster tweets, each comprising image, and text data with five disaster event classes. The results show that the multimodal with transformer-attention mechanism and factorized bilinear pooling (FBP) for intra-modal and cross-modal feature fusion respectively achieved the best performance.

2021

pdf bib
CMTA: COVID-19 Misinformation Multilingual Analysis on Twitter
Raj Pranesh | Mehrdad Farokhenajd | Ambesh Shekhar | Genoveva Vargas-Solar
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

The internet has actually come to be an essential resource of health knowledge for individuals around the world in the present situation of the coronavirus condition pandemic(COVID-19). During pandemic situations, myths, sensationalism, rumours and misinformation, generated intentionally or unintentionally, spread rapidly through social networks. Twitter is one of these popular social networks people use to share COVID-19 related news, information, and thoughts that reflect their perception and opinion about the pandemic. Evaluation of tweets for recognizing misinformation can create beneficial understanding to review the top quality and also the readability of online information concerning the COVID-19. This paper presents a multilingual COVID-19 related tweet analysis method, CMTA, that uses BERT, a deep learning model for multilingual tweet misinformation detection and classification. CMTA extracts features from multilingual textual data, which is then categorized into specific information classes. Classification is done by a Dense-CNN model trained on tweets manually annotated into information classes (i.e., ‘false’, ‘partly false’, ‘misleading’). The paper presents an analysis of multilingual tweets from February to June, showing the distribution type of information spread across different languages. To access the performance of the CMTA multilingual model, we performed a comparative analysis of 8 monolingual model and CMTA for the misinformation detection task. The results show that our proposed CMTA model has surpassed various monolingual models which consolidated the fact that through transfer learning a multilingual framework could be developed.

2020

pdf bib
CLPLM: Character Level Pretrained Language Model for ExtractingSupport Phrases for Sentiment Labels
Raj Pranesh | Sumit Kumar | Ambesh Shekhar
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper, we have designed a character-level pre-trained language model for extracting support phrases from tweets based on the sentiment label. We also propose a character-level ensemble model designed by properly blending Pre-trained Contextual Embeddings (PCE) models- RoBERTa, BERT, and ALBERT along with Neural network models- RNN, CNN and WaveNet at different stages of the model. For a given tweet and associated sentiment label, our model predicts the span of phrases in a tweet that prompts the particular sentiment in the tweet. In our experiments, we have explored various model architectures and configuration for both single as well as ensemble models. We performed a systematic comparative analysis of all the model’s performance based on the Jaccard score obtained. The best performing ensemble model obtained the highest Jaccard scores of 73.5, giving it a relative improvement of 2.4% over the best performing single RoBERTa based character-level model, at 71.5(Jaccard score).

pdf bib
Analysis of Resource-efficient Predictive Models for Natural Language Processing
Raj Pranesh | Ambesh Shekhar
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

In this paper, we presented an analyses of the resource efficient predictive models, namely Bonsai, Binary Neighbor Compression(BNC), ProtoNN, Random Forest, Naive Bayes and Support vector machine(SVM), in the machine learning field for resource constraint devices. These models try to minimize resource requirements like RAM and storage without hurting the accuracy much. We utilized these models on multiple benchmark natural language processing tasks, which were sentimental analysis, spam message detection, emotion analysis and fake news classification. The experiment results shows that the tree-based algorithm, Bonsai, surpassed the rest of the machine learning algorithms by achieve higher accuracy scores while having significantly lower memory usage.