Pankaj Dadure
2025
Sentiment Analysis of Arabic Tweets Using Large Language Models
Pankaj Dadure
|
Ananya Dixit
|
Kunal Tewatia
|
Nandini Paliwal
|
Anshika Malla
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script
In the digital era, sentiment analysis has become an indispensable tool for understanding public sentiments, optimizing market strategies, and enhancing customer engagement across diverse sectors. While significant advancements have been made in sentiment analysis for high-resource languages such as English, French, etc. This study focuses on Arabic, a low-resource language, to address its unique challenges like morphological complexity, diverse dialects, and limited linguistic resources. Existing works in Arabic sentiment analysis have utilized deep learning architectures like LSTM, BiLSTM, and CNN-LSTM, alongside embedding techniques such as Word2Vec and contextualized models like ARABERT. Building on this foundation, our research investigates sentiment classification of Arabic tweets, categorizing them as positive or negative, using embeddings derived from three large language models (LLMs): Universal Sentence Encoder (USE), XLM-RoBERTa base (XLM-R base), and MiniLM-L12-v2. Experimental results demonstrate that incorporating emojis in the dataset and using the MiniLM embeddings yield an accuracy of 85.98%. In contrast, excluding emojis and using embeddings from the XLM-R base resulted in a lower accuracy of 78.98%. These findings highlight the impact of both dataset composition and embedding techniques on Arabic sentiment analysis performance.
2022
Image Caption Generation for Low-Resource Assamese Language
Prachurya Nath
|
Prottay Kumar Adhikary
|
Pankaj Dadure
|
Partha Pakray
|
Riyanka Manna
|
Sivaji Bandyopadhyay
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Image captioning is a prominent Artificial Intelligence (AI) research area that deals with visual recognition and a linguistic description of the image. It is an interdisciplinary field concerning how computers can see and understand digital images& videos, and describe them in a language known to humans. Constructing a meaningful sentence needs both structural and semantic information of the language. This paper highlights the contribution of image caption generation for the Assamese language. The unavailability of an image caption generation system for the Assamese language is an open problem for AI-NLP researchers, and it’s just an early stage of the research. To achieve our defined objective, we have used the encoder-decoder framework, which combines the Convolutional Neural Networks and the Recurrent Neural Networks. The experiment has been tested on Flickr30k and Coco Captions dataset, which have been originally present in the English language. We have translated these datasets into Assamese language using the state-of-the-art Machine Translation (MT) system for our designed work.
English to Bengali Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation
Sahinur Rahman Laskar
|
Pankaj Dadure
|
Riyanka Manna
|
Partha Pakray
|
Sivaji Bandyopadhyay
Proceedings of the 9th Workshop on Asian Translation
Automatic translation of one natural language to another is a popular task of natural language processing. Although the deep learning-based technique known as neural machine translation (NMT) is a widely accepted machine translation approach, it needs an adequate amount of training data, which is a challenging issue for low-resource pair translation. Moreover, the multimodal concept utilizes text and visual features to improve low-resource pair translation. WAT2022 (Workshop on Asian Translation 2022) organizes (hosted by the COLING 2022) English to Bengali multimodal translation task where we have participated as a team named CNLP-NITS-PP in two tracks: 1) text-only and 2) multimodal translation. Herein, we have proposed a transliteration-based phrase pairs augmentation approach which shows improvement in the multimodal translation task and achieved benchmark results on Bengali Visual Genome 1.0 dataset. We have attained the best results on the challenge and evaluation test set for English to Bengali multimodal translation with BLEU scores of 28.70, 43.90 and RIBES scores of 0.688931, 0.780669, respectively.
Search
Fix data
Co-authors
- Sivaji Bandyopadhyay 2
- Riyanka Manna 2
- Partha Pakray 2
- Prottay Kumar Adhikary 1
- Ananya Dixit 1
- show all...