Surendrabikram Thapa


2024

pdf bib
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Ali Hürriyetoğlu | Hristo Tanev | Surendrabikram Thapa | Gökçe Uludoğan
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

pdf bib
Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Hariram Veeramani | Raghav Jain | Sandesh Jain | Francielle Vargas | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Addressing the need for effective hate speech moderation in contemporary digital discourse, the Multimodal Hate Speech Event Detection Shared Task made its debut at CASE 2023, co-located with RANLP 2023. Building upon its success, an extended version of the shared task was organized at the CASE workshop in EACL 2024. Similar to the earlier iteration, in this shared task, participants address hate speech detection through two subtasks. Subtask A is a binary classification problem, assessing whether text-embedded images contain hate speech. Subtask B goes further, demanding the identification of hate speech targets, such as individuals, communities, and organizations within text-embedded images. Performance is evaluated using the macro F1-score metric in both subtasks. With a total of 73 registered participants, the shared task witnessed remarkable achievements, with the best F1-scores in Subtask A and Subtask B reaching 87.27% and 80.05%, respectively, surpassing the leaderboard of the previous CASE 2023 shared task. This paper provides a comprehensive overview of the performance of seven teams that submitted results for Subtask A and five teams for Subtask B.

pdf bib
Stance and Hate Event Detection in Tweets Related to Climate Activism - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Shuvam Shiwakoti | Hariram Veeramani | Raghav Jain | Guneet Singh Kohli | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Social media plays a pivotal role in global discussions, including on climate change. The variety of opinions expressed range from supportive to oppositional, with some instances of hate speech. Recognizing the importance of understanding these varied perspectives, the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) at EACL 2024 hosted a shared task focused on detecting stances and hate speech in climate activism-related tweets. This task was divided into three subtasks: subtasks A and B concentrated on identifying hate speech and its targets, while subtask C focused on stance detection. Participants’ performance was evaluated using the macro F1-score. With over 100 teams participating, the highest F1 scores achieved were 91.44% in subtask C, 78.58% in subtask B, and 74.83% in subtask A. This paper details the methodologies of 24 teams that submitted their results to the competition’s leaderboard.

pdf bib
A Concise Report of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu | Surendrabikram Thapa | Gökçe Uludoğan | Somaiyeh Dehghan | Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

In this paper, we provide a brief overview of the 7th workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) co-located with EACL 2024. This workshop consisted of regular papers, system description papers submitted by shared task participants, and overview papers of shared tasks held. This workshop series has been bringing together experts and enthusiasts from technical and social science fields, providing a platform for better understanding event information. This workshop not only advances text-based event extraction but also facilitates research in event extraction in multimodal settings.

2023

pdf bib
ADEPT: Adapter-based Efficient Prompt Tuning Approach for Language Models
Aditya Shah | Surendrabikram Thapa | Aneesh Jain | Lifu Huang
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

pdf bib
Temporal Tides of Emotional Resonance: A Novel Approach to Identify Mental Health on Social Media
Usman Naseem | Surendrabikram Thapa | Qi Zhang | Junaid Rashid | Liang Hu | Mehwish Nasim
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media

pdf bib
Event Causality Identification - Shared Task 3, CASE 2023
Fiona Anting Tan | Hansi Hettiarachchi | Ali Hürriyetoğlu | Nelleke Oostdijk | Onur Uca | Surendrabikram Thapa | Farhana Ferdousi Liza
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

The Event Causality Identification Shared Task of CASE 2023 is the second iteration of a shared task centered around the Causal News Corpus. Two subtasks were involved: In Subtask 1, participants were challenged to predict if a sentence contains a causal relation or not. In Subtask 2, participants were challenged to identify the Cause, Effect, and Signal spans given an input causal sentence. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper includes an overview of the work of the ten teams that submitted their results to our competition and the six system description papers that were received. The highest F1 scores achieved for Subtask 1 and 2 were 84.66% and 72.79%, respectively.

pdf bib
Multimodal Hate Speech Event Detection - Shared Task 4, CASE 2023
Surendrabikram Thapa | Farhan Jafri | Ali Hürriyetoğlu | Francielle Vargas | Roy Ka-Wei Lee | Usman Naseem
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Ensuring the moderation of hate speech and its targets emerges as a critical imperative within contemporary digital discourse. To facilitate this imperative, the shared task Multimodal Hate Speech Event Detection was organized in the sixth CASE workshop co-located at RANLP 2023. The shared task has two subtasks. The sub-task A required participants to pose hate speech detection as a binary problem i.e. they had to detect if the given text-embedded image had hate or not. Similarly, sub-task B required participants to identify the targets of the hate speech namely individual, community, and organization targets in text-embedded images. For both sub-tasks, the participants were ranked on the basis of the F1-score. The best F1-score in sub-task A and sub-task B were 85.65 and 76.34 respectively. This paper provides a comprehensive overview of the performance of 13 teams that submitted the results in Subtask A and 10 teams in Subtask B.

pdf bib
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Osman Mutlu | Surendrabikram Thapa | Fiona Anting Tan | Erdem Yörük
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.

pdf bib
Assessing Political Inclination of Bangla Language Models
Surendrabikram Thapa | Ashwarya Maratha | Khan Md Hasib | Mehwish Nasim | Usman Naseem
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Natural language processing has advanced with AI-driven language models (LMs), that are applied widely from text generation to question answering. These models are pre-trained on a wide spectrum of data sources, enhancing accuracy and responsiveness. However, this process inadvertently entails the absorption of a diverse spectrum of viewpoints inherent within the training data. Exploring political leaning within LMs due to such viewpoints remains a less-explored domain. In the context of a low-resource language like Bangla, this area of research is nearly non-existent. To bridge this gap, we comprehensively analyze biases present in Bangla language models, specifically focusing on social and economic dimensions. Our findings reveal the inclinations of various LMs, which will provide insights into ethical considerations and limitations associated with deploying Bangla LMs.

pdf bib
LowResourceNLU at BLP-2023 Task 1 & 2: Enhancing Sentiment Classification and Violence Incitement Detection in Bangla Through Aggregated Language Models
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Violence incitement detection and sentiment analysis hold significant importance in the field of natural language processing. However, in the case of the Bangla language, there are unique challenges due to its low-resource nature. In this paper, we address these challenges by presenting an innovative approach that leverages aggregated BERT models for two tasks at the BLP workshop in EMNLP 2023, specifically tailored for Bangla. Task 1 focuses on violence-inciting text detection, while task 2 centers on sentiment analysis. Our approach combines fine-tuning with textual entailment (utilizing BanglaBERT), Masked Language Model (MLM) training (making use of BanglaBERT), and the use of standalone Multilingual BERT. This comprehensive framework significantly enhances the accuracy of sentiment classification and violence incitement detection in Bangla text. Our method achieved the 11th rank in task 1 with an F1-score of 73.47 and the 4th rank in task 2 with an F1-score of 71.73. This paper provides a detailed system description along with an analysis of the impact of each component of our framework.

pdf bib
Breaking Barriers: Exploring the Diagnostic Potential of Speech Narratives in Hindi for Alzheimer’s Disease
Kritesh Rauniyar | Shuvam Shiwakoti | Sweta Poudel | Surendrabikram Thapa | Usman Naseem | Mehwish Nasim
Proceedings of the 5th Clinical Natural Language Processing Workshop

Alzheimer’s Disease (AD) is a neurodegenerative disorder that affects cognitive abilities and memory, especially in older adults. One of the challenges of AD is that it can be difficult to diagnose in its early stages. However, recent research has shown that changes in language, including speech decline and difficulty in processing information, can be important indicators of AD and may help with early detection. Hence, the speech narratives of the patients can be useful in diagnosing the early stages of Alzheimer’s disease. While the previous works have presented the potential of using speech narratives to diagnose AD in high-resource languages, this work explores the possibility of using a low-resourced language, i.e., Hindi language, to diagnose AD. In this paper, we present a dataset specifically for analyzing AD in the Hindi language, along with experimental results using various state-of-the-art algorithms to assess the diagnostic potential of speech narratives in Hindi. Our analysis suggests that speech narratives in the Hindi language have the potential to aid in the diagnosis of AD. Our dataset and code are made publicly available at https://github.com/rkritesh210/DementiaBankHindi.

pdf bib
Reducing Knowledge Noise for Improved Semantic Analysis in Biomedical Natural Language Processing Applications
Usman Naseem | Surendrabikram Thapa | Qi Zhang | Liang Hu | Anum Masood | Mehwish Nasim
Proceedings of the 5th Clinical Natural Language Processing Workshop

Graph-based techniques have gained traction for representing and analyzing data in various natural language processing (NLP) tasks. Knowledge graph-based language representation models have shown promising results in leveraging domain-specific knowledge for NLP tasks, particularly in the biomedical NLP field. However, such models have limitations, including knowledge noise and neglect of contextual relationships, leading to potential semantic errors and reduced accuracy. To address these issues, this paper proposes two novel methods. The first method combines knowledge graph-based language model with nearest-neighbor models to incorporate semantic and category information from neighboring instances. The second method involves integrating knowledge graph-based language model with graph neural networks (GNNs) to leverage feature information from neighboring nodes in the graph. Experiments on relation extraction (RE) and classification tasks in English and Chinese language datasets demonstrate significant performance improvements with both methods, highlighting their potential for enhancing the performance of language models and improving NLP applications in the biomedical domain.

pdf bib
Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

In the evolving landscape of Environmental, Social, and Corporate Governance (ESG) impact assessment, the ML-ESG-2 shared task proposes identifying ESG impact types. To address this challenge, we present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches. Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features. Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the best performance. Our system offers a comprehensive ESG impact type identification solution, contributing to the responsible and sustainable decision-making processes vital in today’s financial and corporate governance landscape.

pdf bib
KnowTellConvince at ArAIEval Shared Task: Disinformation and Persuasion Detection in Arabic using Similar and Contrastive Representation Alignment
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

In an era of widespread digital communication, the challenge of identifying and countering disinformation has become increasingly critical. However, compared to the solutions available in the English language, the resources and strategies for tackling this multifaceted problem in Arabic are relatively scarce. To address this issue, this paper presents our solutions to tasks in ArAIEval 2023. Task 1 focuses on detecting persuasion techniques, while Task 2 centers on disinformation detection within Arabic text. Leveraging a multi-head model architecture, fine-tuning techniques, sequential learning, and innovative activation functions, our contributions significantly enhance persuasion techniques and disinformation detection accuracy. Beyond improving performance, our work fills a critical research gap in content analysis for Arabic, empowering individuals, communities, and digital platforms to combat deceptive content effectively and preserve the credibility of information sources within the Arabic-speaking world.

pdf bib
DialectNLU at NADI 2023 Shared Task: Transformer Based Multitask Approach Jointly Integrating Dialect and Machine Translation Tasks in Arabic
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

With approximately 400 million speakers worldwide, Arabic ranks as the fifth most-spoken language globally, necessitating advancements in natural language processing. This paper addresses this need by presenting a system description of the approaches employed for the subtasks outlined in the Nuanced Arabic Dialect Identification (NADI) task at EMNLP 2023. For the first subtask, involving closed country-level dialect identification classification, we employ an ensemble of two Arabic language models. Similarly, for the second subtask, focused on closed dialect to Modern Standard Arabic (MSA) machine translation, our approach combines sequence-to-sequence models, all trained on an Arabic-specific dataset. Our team ranks 10th and 3rd on subtask 1 and subtask 2 respectively.

pdf bib
LowResContextQA at Qur’an QA 2023 Shared Task: Temporal and Sequential Representation Augmented Question Answering Span Detection in Arabic
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

The Qur’an holds immense theological and historical significance, and developing a technology-driven solution for answering questions from this sacred text is of paramount importance. This paper presents our approach to task B of Qur’an QA 2023, part of EMNLP 2023, addressing this challenge by proposing a robust method for extracting answers from Qur’anic passages. Leveraging the Qur’anic Reading Comprehension Dataset (QRCD) v1.2, we employ innovative techniques and advanced models to improve the precision and contextuality of answers derived from Qur’anic passages. Our methodology encompasses the utilization of start and end logits, Long Short-Term Memory (LSTM) networks, and fusion mechanisms, contributing to the ongoing dialogue at the intersection of technology and spirituality.

pdf bib
Automated Citation Function Classification and Context Extraction in Astrophysics: Leveraging Paraphrasing and Question Answering
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the Second Workshop on Information Extraction from Scientific Publications

2022

pdf bib
A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict
Surendrabikram Thapa | Aditya Shah | Farhan Jafri | Usman Naseem | Imran Razzak
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

This paper presents a new multi-modal dataset for identifying hateful content on social media, consisting of 5,680 text-image pairs collected from Twitter, labeled across two labels. Experimental analysis of the presented dataset has shown that understanding both modalities is essential for detecting these techniques. It is confirmed in our experiments with several state-of-the-art multi-modal models. In future work, we plan to extend the dataset in size. We further plan to develop new multi-modal models tailored explicitly to hate-speech detection, aiming for a deeper understanding of the text and image relation. It would also be interesting to perform experiments in a direction that explores what social entities the given hate speech tweet targets.