Surendrabikram Thapa - ACL Anthology

Surendrabikram Thapa

2026

We present SemEval-2026 Task 9, a shared task on online polarization detection, covering 22 languages and comprising over 110K annotated instances. Each data instance is multi-labeled with the presence of polarization, polarization type, and polarization manifestation. Participants were asked to predict labels in three subtasks: (1) detecting the presence of polarization, (2) identifying the type of polarization, and (3) recognizing the polarization manifestation. The three tasks attracted over 1,000 participants worldwide and more than 10k submissions on Codabench. We received final submissions from 67 teams and 69 system description papers. We report the baseline results and analyze the performance of the best-performing systems, highlighting the most common approaches and the most effective methods across different subtasks and languages. The dataset and other resources for this task are publicly available.

Self-Explaining Hate Speech Detection with Moral Rationales
Francielle Vargas | Jackson Trager | Diego Alves | Matteo Guida | Surendrabikram Thapa | Berk Atıl | Daryna Dementieva | Andrew J Smart | Ameeta Agrawal
Findings of the Association for Computational Linguistics: ACL 2026

Existing hate speech detection models are often opaque and rely on surface-level lexical cues, which makes them vulnerable to spurious correlations and limits robustness, interpretability and cultural contextualization. We propose Supervised Moral Rationale Attention (SMRA), the first self-explaining hate speech detection framework to incorporate moral rationales as direct supervision for attention alignment. Based on Moral Foundations Theory, SMRA aligns token-level attention with expert-annotated moral rationales, guiding models to attend to morally salient spans. Unlike prior rationale-supervised or post-hoc approaches, SMRA integrates moral rationale supervision directly into the training objective, producing inherently interpretable and contextualized explanations. To support our framework, we also introduce HateBRMoralXplain, a Brazilian Portuguese benchmark dataset annotated with hate labels, moral categories, token-level moral rationales, and socio-political metadata. Across binary hate speech detection and multi-label moral sentiment classification, SMRA consistently improves performance while enhancing both faithful and plausible explanations. Although explanations become more concise, sufficiency decreases, indicating more compact and informative rationales. Fairness remains stable, suggesting that improvements in explanation quality do not introduce significant bias trade-offs.

POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
Usman Naseem | Robert Geislinger | Juan Ren | Sarah Kohail | Rudy Alexandro Garrido Veliz | P Sam Sahil | Yiran Zhang | Idris Abdulmumin | Marco Antonio Stranisci | Özge Alacam | Cengiz Acarturk | Aisha Jabr | Saba Anwar | Abinew Ali Ayele | Simona Frenda | Alessandra Teresa Cignarella | Elena Tutubalina | Oleg Rogov | Aung Kyaw Htet | Xintong Wang | Surendrabikram Thapa | Kritesh Rauniyar | Tanmoy Chakraborty | MD Arfeen Zeeshan | Dheeraj Kodati | Satya Keerthi | Sahar Moradizeyveh | Firoj Alam | Md Arid Hasan | Syed Ishtiaque Ahmed | Ye Kyaw Thu | Shantipriya Parida | Ihsan Ayyub Qazi | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Clemencia Siro | Jane Wanjiru Kimani | Ibrahim Said Ahmad | Adem Chanie Ali | Martin Semmann | Chris Biemann | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: ACL 2026

Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multi-event dataset with over 110K instances in 22 languages drawn from diverse online platforms and real-world events. Polarization is annotated along three axes, namely detection, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) fine-tuning six pretrained small language models; and (2) evaluating a range of open and closed large language models in few-shot and zero-shot settings. Results show that while most models perform well on binary polarization detection, they achieve substantially lower performance when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and underscore the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.

Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification
Sujal Maharjan | Astha Shrestha | Lakshmojee Koduru | Sweta Poudel | Shuvam Shiwakoti | Rabin Thapa | Kritesh Rauniyar | Surendrabikram Thapa
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.

Multimodal Identification of Vaccine Content Stance on Social Media
Surendrabikram Thapa | Shuvam Shiwakoti | Siddhant Bikram Shah | Kritesh Rauniyar | Laxmi Thapa | Surabhi Adhikari | Kristina T. Johnson | Ali Hürriyetoğlu | Hristo Tanev | Usman Naseem
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

Vaccination-related memes on social media play an increasingly influential role in shaping public perception of immunization, often spreading both supportive messaging and vaccine-critical narratives through multimodal communication. Detecting such content is challenging due to the combined use of images, embedded text, sarcasm, humor, and cultural references. This paper presents an overview of the Shared Task on Multimodal Identification of Vaccine Critical Content on Social Media, organized as part of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026) at ACL 2026. The task is based on the VaxMeme dataset, a large-scale collection of vaccination-related memes annotated into three classes: Vaccine-critical, Neutral, and Pro-vaccine. A total of 77 participants registered for the competition, with 25 teams submitting systems for evaluation. Participating approaches included transformer-based multimodal architectures, vision-language models, ensemble methods, and instruction-tuned large language models. The best-performing system achieved a macro F1-score of 0.8494. This shared task provides insights into the strengths and limitations of current multimodal approaches for vaccine stance detection and highlights future directions for robust public health misinformation analysis.

Understanding Toxic Behavior in Gaming Communities Using AI to Promote Healthier Digital Spaces
Surendrabikram Thapa | Shuvam Shiwakoti | Siddhant Bikram Shah | Kritesh Rauniyar | Laxmi Thapa | Surabhi Adhikari | Kristina T. Johnson | Ali Hürriyetoğlu | Hristo Tanev | Usman Naseem
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

Online gaming communities are increasingly affected by toxic communication, including harassment, threats, hate speech, and extremist content. Detecting such behavior is challenging due to the short, noisy, multilingual, and highly imbalanced nature of gaming chat data. To advance research in this area, we organized the Shared Task on Fine-Grained Toxicity Detection in Online Gaming at EEUCA 2026, co-located with ACL 2026. The task is based on the GameTox dataset, containing approximately 53,000 annotated chat utterances from World of Tanks across six toxicity categories. A total of 102 participants took part, and 35 teams submitted systems exploring approaches such as domain-adaptive pretraining, multilingual transfer learning, contrastive learning, LLM-based augmentation, and ensemble methods. Systems were evaluated using macro-averaged F1-score, with the top system achieving 0.7041 Macro F1. This paper presents an overview of the shared task, dataset, evaluation framework, participant methods, and key findings.

Overview of the Workshop on Event Extraction and Understanding: Challenges and Applications
Ali Hürriyetoğlu | Surendrabikram Thapa | Hristo Tanev | Laxmi Thapa | Surabhi Adhikari
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

This paper presents an overview of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026), held in conjunction with ACL 2026. Formerly known as CASE, the workshop continues its mission of bringing together researchers from natural language processing, machine learning, computational social science, and related disciplines to advance research on event extraction and understanding. This year’s edition particularly emphasized the growing influence of large language models (LLMs), multimodal learning, and weakly supervised methodologies in event extraction research. The workshop featured six regular research papers covering topics such as low-resource event extraction, reflective multi-agent architectures, symbolic auditing of procedural events, geopolitical event extraction, and generative event extraction strategies. In addition, EEUCA 2026 hosted two shared tasks focusing on toxicity detection in gaming communities and multimodal vaccine-critical meme analysis, attracting broad international participation and encouraging research on socially impactful applications of AI. The workshop highlights current advances, emerging challenges, and future directions in multilingual, multimodal, and socially aware event extraction systems.

Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Ali Hürriyetoğlu | Surendrabikram Thapa | Hristo Tanev | Surabhi Adhikari
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

2025

Proceedings of the 9th Widening NLP Workshop
Chen Zhang | Emily Allaway | Hua Shen | Lesly Miculicich | Yinqiao Li | Meryem M'hamdi | Peerat Limkonchotiwat | Richard He Bai | Santosh T.y.s.s. | Sophia Simeng Han | Surendrabikram Thapa | Wiem Ben Rim
Proceedings of the 9th Widening NLP Workshop

GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities
Usman Naseem | Shuvam Shiwakoti | Siddhant Bikram Shah | Surendrabikram Thapa | Qi Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

The prevalence of toxic behavior in online gaming communities necessitates robust detection methods to ensure user safety. We introduce GameTox, a novel dataset comprising 53K game chat utterances annotated for toxicity detection through intent classification and slot filling. This dataset captures the complex relationship between user intent and specific linguistic features that contribute to toxic interactions. We extensively analyze the dataset to uncover key insights into the nature of toxic speech in gaming environments. Furthermore, we establish baseline performance metrics using state-of-the-art natural language processing and large language models, demonstrating the dataset’s contribution towards enhancing the detection of toxic behavior and revealing the limitations of contemporary models. Our results indicate that leveraging both intent detection and slot filling provides a significantly more granular and context-aware understanding of harmful messages. This dataset serves as a valuable resource to train advanced models that can effectively mitigate toxicity in online gaming and foster healthier digital spaces. Our dataset is publicly available at: https://github.com/shucoll/GameTox.

Probing the Limits of Multilingual Language Understanding: Low-Resource Language Proverbs as LLM Benchmark for AI Wisdom
Surendrabikram Thapa | Kritesh Rauniyar | Hariram Veeramani | Surabhi Adhikari | Imran Razzak | Usman Naseem
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Understanding and interpreting culturally specific language remains a significant challenge for multilingual natural language processing (NLP) systems, particularly for less-resourced languages. To address this problem, this paper introduces PRONE, a novel dataset of 2,830 Nepali proverbs, and evaluates the performance of various language models (LMs) in two tasks: (i) identifying the correct meaning of a proverb from multiple choices, and (ii) categorizing proverbs into predefined thematic categories. The models, including both open-source and proprietary, were tested in zero-shot and few-shot settings with prompts in English and Nepali. While models like GPT-4o demonstrated promising results and achieved the highest performance among LMs, they still fall short of human-level accuracy in understanding and categorizing culturally nuanced content, highlighting the need for more inclusive NLP.

Natural Language Understanding of Devanagari Script Languages: Language Identification, Hate Speech and its Target Detection
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Ahmad Jafri | Surabhi Adhikari | Kengatharaiyer Sarveswaran | Bal Krishna Bal | Hariram Veeramani | Usman Naseem
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

The growing use of Devanagari-script languages such as Hindi, Nepali, Marathi, Sanskrit, and Bhojpuri on social media presents unique challenges for natural language understanding (NLU), particularly in language identification, hate speech detection, and target classification. To address these challenges, we organized a shared task with three subtasks: (i) identifying the language of Devanagari-script text, (ii) detecting hate speech, and (iii) classifying hate speech targets into individual, community, or organization. A curated dataset combining multiple corpora was provided, with splits for training, evaluation, and testing. The task attracted 113 participants, with 32 teams submitting models evaluated on accuracy, precision, recall, and macro F1-score. Participants applied innovative methods, including large language models, transformer models, and multilingual embeddings, to tackle the linguistic complexities of Devanagari-script languages. This paper summarizes the shared task, datasets, and results, and aims to contribute to advancing NLU for low-resource languages and fostering inclusive, culturally aware natural language processing (NLP) solutions.

A Brief Overview of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL)
Kengatharaiyer Sarveswaran | Surendrabikram Thapa | Sana Shams | Ashwini Vaidya | Bal Krishna Bal
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

In this paper, we provide a brief summary of the inaugural workshop on Challenges in Processing South Asian Languages (CHiPSAL) held as part of COLING 2025. The workshop included regular papers, invited keynotes, and shared task papers, fostering a collaborative platform for exploring challenges in processing South Asian languages. The shared task focused on Devanagari-script language understanding, encompassing subtasks on language identification, hate speech detection, and target classification. This workshop series aims to address linguistic and cultural nuances, resource constraints, and orthographic complexities in low-resource South Asian languages while advancing NLP research and promoting multilingual inclusivity.

Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Kengatharaiyer Sarveswaran | Ashwini Vaidya | Bal Krishna Bal | Sana Shams | Surendrabikram Thapa
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

Multimodal Hate, Humor, and Stance Event Detection in Marginalized Sociopolitical Movements
Surendrabikram Thapa | Siddhant Bikram Shah | Kritesh Rauniyar | Shuvam Shiwakoti | Surabhi Adhikari | Hariram Veeramani | Kristina T. Johnson | Ali Hurriyetoglu | Hristo Tanev | Usman Naseem
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

This paper presents the Shared Task on Multimodal Detection of Hate Speech, Humor, and Stance in Marginalized Socio-Political Movement Discourse, hosted at CASE 2025. The task is built on the PrideMM dataset, a curated collection of 5,063 text-embedded images related to the LGBTQ+ pride movement, annotated for four interrelated subtasks: (A) Hate Speech Detection, (B) Hate Target Classification, (C) Topical Stance Classification, and (D) Intended Humor Detection. Eighty-nine teams registered, with competitive submissions across all subtasks. The results show that multimodal approaches consistently outperform unimodal baselines, particularly for hate speech detection, while fine-grained tasks such as target identification and stance classification remain challenging due to label imbalance, multimodal ambiguity, and implicit or culturally specific content. CLIP-based models and parameter-efficient fusion architectures achieved strong performance, showing promising directions for low-resource and efficient multimodal systems.

Challenges and Applications of Automated Extraction of Socio-political Events at the age of Large Language Models
Surendrabikram Thapa | Surabhi Adhikari | Hristo Tanev | Ali Hurriyetoglu
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

Socio-political event extraction (SPE) enables automated identification of critical events such as protests, conflicts, and policy shifts from unstructured text. As a foundational tool for journalism, social science research, and crisis response, SPE plays a key role in understanding complex global dynamics. The emergence of large language models (LLMs) like GPT-4 and LLaMA offers new opportunities for flexible, multilingual, and zero-shot SPE. However, applying LLMs to this domain introduces significant risks, including hallucinated outputs, lack of transparency, geopolitical bias, and potential misuse in surveillance or censorship. This position paper critically examines the promises and pitfalls of LLM-driven SPE, drawing on recent datasets and benchmarks. We argue that SPE is a high-stakes application requiring rigorous ethical scrutiny, interdisciplinary collaboration, and transparent design practices. We propose a research agenda focused on reproducibility, participatory development, and building systems that align with democratic values and the rights of affected communities.

Findings and Insights from the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hurriyetoglu | Surendrabikram Thapa | Hristo Tanev | Surabhi Adhikari
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

This paper presents an overview of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE), held in conjunction with RANLP 2025. The workshop featured a range of contributions, including regular research papers, system descriptions from shared task participants, and an overview paper on shared task outcomes. Continuing its tradition, CASE brings together researchers from computational and social sciences to explore the evolving landscape of event extraction. With the rapid advancement of large language models (LLMs), this year’s edition placed particular emphasis on their application to socio-political event extraction. Alongside text-based approaches, the workshop also highlighted the growing interest in multimodal event extraction, addressing complex real-world scenarios across diverse modalities.

Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts
Ali Hürriyetoğlu | Hristo Tanev | Surendrabikram Thapa | Surabhi Adhikari
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

2024

MLInitiative@WILDRE7: Hybrid Approaches with Large Language Models for Enhanced Sentiment Analysis in Code-Switched and Code-Mixed Texts
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the 7th Workshop on Indian Language Data: Resources and Evaluation

Code-switched and code-mixed languages are prevalent in multilingual societies, reflecting the complex interplay of cultures and languages in daily communication. Understanding the sentiment embedded in such texts is crucial for a range of applications, from improving social media analytics to enhancing customer feedback systems. Despite their significance, research in code-mixed and code-switched languages remains limited, particularly in less-resourced languages. This scarcity of research creates a gap in natural language processing (NLP) technologies, hindering their ability to accurately interpret the rich linguistic diversity of global communications. To bridge this gap, this paper presents a novel methodology for sentiment analysis in code-mixed and code-switched texts. Our approach combines the power of large language models (LLMs) and the versatility of the multilingual BERT (mBERT) framework to effectively process and analyze sentiments in multilingual data. By decomposing code-mixed texts into their constituent languages, employing mBERT for named entity recognition (NER) and sentiment label prediction, and integrating these insights into a decision-making LLM, we provide a comprehensive framework for understanding sentiment in complex linguistic contexts. Our system achieves competitive rank on all subtasks in the Code-mixed Less-Resourced Sentiment analysis (Code-mixed) shared task at WILDRE-7 (LREC-COLING).

Why the Unexpected? Dissecting the Political and Economic Bias in Persian Small and Large Language Models
Ehsan Barkhordar | Surendrabikram Thapa | Ashwarya Maratha | Usman Naseem
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

Recently, language models (LMs) like BERT and large language models (LLMs) like GPT-4 have demonstrated potential in various linguistic tasks such as text generation, translation, and sentiment analysis. However, these abilities come with a cost of a risk of perpetuating biases from their training data. Political and economic inclinations play a significant role in shaping these biases. Thus, this research aims to understand political and economic biases in Persian LMs and LLMs, addressing a significant gap in AI ethics and fairness research. Focusing on the Persian language, our research employs a two-step methodology. First, we utilize the political compass test adapted to Persian. Second, we analyze biases present in these models. Our findings indicate the presence of nuanced biases, underscoring the importance of ethical considerations in AI deployments within Persian-speaking contexts.

Analyzing the Dynamics of Climate Change Discourse on Twitter: A New Annotated Corpus and Multi-Aspect Classification
Shuvam Shiwakoti | Surendrabikram Thapa | Kritesh Rauniyar | Akshyat Shah | Aashish Bhandari | Usman Naseem
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The discourse surrounding climate change on social media platforms has emerged as a significant avenue for understanding public sentiments, perspectives, and engagement with this critical global issue. The unavailability of publicly available datasets, coupled with ignoring the multi-aspect analysis of climate discourse on social media platforms, has underscored the necessity for further advancement in this area. To address this gap, in this paper, we present an extensive exploration of the intricate realm of climate change discourse on Twitter, leveraging a meticulously annotated ClimaConvo dataset comprising 15,309 tweets. Our annotations encompass a rich spectrum, including aspects like relevance, stance, hate speech, the direction of hate, and humor, offering a nuanced understanding of the discourse dynamics. We address the challenges inherent in dissecting online climate discussions and detail our comprehensive annotation methodology. In addition to annotations, we conduct benchmarking assessments across various algorithms for six tasks: relevance detection, stance detection, hate speech identification, direction and target, and humor analysis. This assessment enhances our grasp of sentiment fluctuations and linguistic subtleties within the discourse. Our analysis extends to exploratory data examination, unveiling tweet distribution patterns, stance prevalence, and hate speech trends. Employing sophisticated topic modeling techniques uncovers underlying thematic clusters, providing insights into the diverse narrative threads woven within the discourse. The findings present a valuable resource for researchers, policymakers, and communicators seeking to navigate the intricacies of climate change discussions. The dataset and resources for this paper are available at https://github.com/shucoll/ClimaConvo.

A Concise Report of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu | Surendrabikram Thapa | Gökçe Uludoğan | Somaiyeh Dehghan | Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

In this paper, we provide a brief overview of the 7th workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) co-located with EACL 2024. This workshop consisted of regular papers, system description papers submitted by shared task participants, and overview papers of shared tasks held. This workshop series has been bringing together experts and enthusiasts from technical and social science fields, providing a platform for better understanding event information. This workshop not only advances text-based event extraction but also facilitates research in event extraction in multimodal settings.

Stance and Hate Event Detection in Tweets Related to Climate Activism - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Shuvam Shiwakoti | Hariram Veeramani | Raghav Jain | Guneet Singh Kohli | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Social media plays a pivotal role in global discussions, including on climate change. The variety of opinions expressed range from supportive to oppositional, with some instances of hate speech. Recognizing the importance of understanding these varied perspectives, the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) at EACL 2024 hosted a shared task focused on detecting stances and hate speech in climate activism-related tweets. This task was divided into three subtasks: subtasks A and B concentrated on identifying hate speech and its targets, while subtask C focused on stance detection. Participants’ performance was evaluated using the macro F1-score. With over 100 teams participating, the highest F1 scores achieved were 91.44% in subtask C, 78.58% in subtask B, and 74.83% in subtask A. This paper details the methodologies of 24 teams that submitted their results to the competition’s leaderboard.

Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis - Shared Task at CASE 2024
Surendrabikram Thapa | Kritesh Rauniyar | Farhan Jafri | Hariram Veeramani | Raghav Jain | Sandesh Jain | Francielle Vargas | Ali Hürriyetoğlu | Usman Naseem
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Addressing the need for effective hate speech moderation in contemporary digital discourse, the Multimodal Hate Speech Event Detection Shared Task made its debut at CASE 2023, co-located with RANLP 2023. Building upon its success, an extended version of the shared task was organized at the CASE workshop in EACL 2024. Similar to the earlier iteration, in this shared task, participants address hate speech detection through two subtasks. Subtask A is a binary classification problem, assessing whether text-embedded images contain hate speech. Subtask B goes further, demanding the identification of hate speech targets, such as individuals, communities, and organizations within text-embedded images. Performance is evaluated using the macro F1-score metric in both subtasks. With a total of 73 registered participants, the shared task witnessed remarkable achievements, with the best F1-scores in Subtask A and Subtask B reaching 87.27% and 80.05%, respectively, surpassing the leaderboard of the previous CASE 2023 shared task. This paper provides a comprehensive overview of the performance of seven teams that submitted results for Subtask A and five teams for Subtask B.

Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Ali Hürriyetoğlu | Hristo Tanev | Surendrabikram Thapa | Gökçe Uludoğan
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Large Language Model-based Pipeline for Item Difficulty and Response Time Estimation for Educational Assessments
Hariram Veeramani | Surendrabikram Thapa | Natarajan Balaji Shankar | Abeer Alwan
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

This work presents a novel framework for the automated prediction of item difficulty and response time within educational assessments. Utilizing data from the BEA 2024 Shared Task, we integrate Named Entity Recognition, Semantic Role Labeling, and linguistic features to prompt a Large Language Model (LLM). Our best approach achieves an RMSE of 0.308 for item difficulty and 27.474 for response time prediction, improving on the provided baseline. The framework’s adaptability is demonstrated on audio recordings of 3rd-8th graders from the Atlanta, Georgia area responding to the Test of Narrative Language. These results highlight the framework’s potential to enhance test development efficiency.

Which Side Are You On? Investigating Politico-Economic Bias in Nepali Language Models
Surendrabikram Thapa | Kritesh Rauniyar | Ehsan Barkhordar | Hariram Veeramani | Usman Naseem
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association

Language models are trained on vast datasets sourced from the internet, which inevitably contain biases that reflect societal norms, stereotypes, and political inclinations. These biases can manifest in model outputs, influencing a wide range of applications. While there has been extensive research on bias detection and mitigation in large language models (LLMs) for widely spoken languages like English, there is a significant gap when it comes to low-resource languages such as Nepali. This paper addresses this gap by investigating the political and economic biases present in five fill-mask models and eleven generative models trained for the Nepali language. To assess these biases, we translated the Political Compass Test (PCT) into Nepali and evaluated the models’ outputs along social and economic axes. Our findings reveal distinct biases across models, with small LMs showing a right-leaning economic bias, while larger models exhibit more complex political orientations, including left-libertarian tendencies. This study emphasizes the importance of addressing biases in low-resource languages to promote fairness and inclusivity in AI-driven technologies. Our work provides a foundation for future research on bias detection and mitigation in underrepresented languages like Nepali, contributing to the broader goal of creating more ethical AI systems.

2023

Automated Citation Function Classification and Context Extraction in Astrophysics: Leveraging Paraphrasing and Question Answering
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the Second Workshop on Information Extraction from Scientific Publications

ADEPT: Adapter-based Efficient Prompt Tuning Approach for Language Models
Aditya Shah | Surendrabikram Thapa | Aneesh Jain | Lifu Huang
Proceedings of the Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

Temporal Tides of Emotional Resonance: A Novel Approach to Identify Mental Health on Social Media
Usman Naseem | Surendrabikram Thapa | Qi Zhang | Junaid Rashid | Liang Hu | Mehwish Nasim
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media

Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

In the evolving landscape of Environmental, Social, and Corporate Governance (ESG) impact assessment, the ML-ESG-2 shared task proposes identifying ESG impact types. To address this challenge, we present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches. Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features. Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the best performance. Our system offers a comprehensive ESG impact type identification solution, contributing to the responsible and sustainable decision-making processes vital in today’s financial and corporate governance landscape.

Breaking Barriers: Exploring the Diagnostic Potential of Speech Narratives in Hindi for Alzheimer’s Disease
Kritesh Rauniyar | Shuvam Shiwakoti | Sweta Poudel | Surendrabikram Thapa | Usman Naseem | Mehwish Nasim
Proceedings of the 5th Clinical Natural Language Processing Workshop

Alzheimer’s Disease (AD) is a neurodegenerative disorder that affects cognitive abilities and memory, especially in older adults. One of the challenges of AD is that it can be difficult to diagnose in its early stages. However, recent research has shown that changes in language, including speech decline and difficulty in processing information, can be important indicators of AD and may help with early detection. Hence, the speech narratives of the patients can be useful in diagnosing the early stages of Alzheimer’s disease. While the previous works have presented the potential of using speech narratives to diagnose AD in high-resource languages, this work explores the possibility of using a low-resourced language, i.e., Hindi language, to diagnose AD. In this paper, we present a dataset specifically for analyzing AD in the Hindi language, along with experimental results using various state-of-the-art algorithms to assess the diagnostic potential of speech narratives in Hindi. Our analysis suggests that speech narratives in the Hindi language have the potential to aid in the diagnosis of AD. Our dataset and code are made publicly available at https://github.com/rkritesh210/DementiaBankHindi.

Reducing Knowledge Noise for Improved Semantic Analysis in Biomedical Natural Language Processing Applications
Usman Naseem | Surendrabikram Thapa | Qi Zhang | Liang Hu | Anum Masood | Mehwish Nasim
Proceedings of the 5th Clinical Natural Language Processing Workshop

Graph-based techniques have gained traction for representing and analyzing data in various natural language processing (NLP) tasks. Knowledge graph-based language representation models have shown promising results in leveraging domain-specific knowledge for NLP tasks, particularly in the biomedical NLP field. However, such models have limitations, including knowledge noise and neglect of contextual relationships, leading to potential semantic errors and reduced accuracy. To address these issues, this paper proposes two novel methods. The first method combines knowledge graph-based language model with nearest-neighbor models to incorporate semantic and category information from neighboring instances. The second method involves integrating knowledge graph-based language model with graph neural networks (GNNs) to leverage feature information from neighboring nodes in the graph. Experiments on relation extraction (RE) and classification tasks in English and Chinese language datasets demonstrate significant performance improvements with both methods, highlighting their potential for enhancing the performance of language models and improving NLP applications in the biomedical domain.

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Osman Mutlu | Surendrabikram Thapa | Fiona Anting Tan | Erdem Yörük
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.

Multimodal Hate Speech Event Detection - Shared Task 4, CASE 2023
Surendrabikram Thapa | Farhan Jafri | Ali Hürriyetoğlu | Francielle Vargas | Roy Ka-Wei Lee | Usman Naseem
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Ensuring the moderation of hate speech and its targets emerges as a critical imperative within contemporary digital discourse. To facilitate this imperative, the shared task Multimodal Hate Speech Event Detection was organized in the sixth CASE workshop co-located at RANLP 2023. The shared task has two subtasks. The sub-task A required participants to pose hate speech detection as a binary problem i.e. they had to detect if the given text-embedded image had hate or not. Similarly, sub-task B required participants to identify the targets of the hate speech namely individual, community, and organization targets in text-embedded images. For both sub-tasks, the participants were ranked on the basis of the F1-score. The best F1-score in sub-task A and sub-task B were 85.65 and 76.34 respectively. This paper provides a comprehensive overview of the performance of 13 teams that submitted the results in Subtask A and 10 teams in Subtask B.

Event Causality Identification - Shared Task 3, CASE 2023
Fiona Anting Tan | Hansi Hettiarachchi | Ali Hürriyetoğlu | Nelleke Oostdijk | Onur Uca | Surendrabikram Thapa | Farhana Ferdousi Liza
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

The Event Causality Identification Shared Task of CASE 2023 is the second iteration of a shared task centered around the Causal News Corpus. Two subtasks were involved: In Subtask 1, participants were challenged to predict if a sentence contains a causal relation or not. In Subtask 2, participants were challenged to identify the Cause, Effect, and Signal spans given an input causal sentence. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper includes an overview of the work of the ten teams that submitted their results to our competition and the six system description papers that were received. The highest F1 scores achieved for Subtask 1 and 2 were 84.66% and 72.79%, respectively.

Assessing Political Inclination of Bangla Language Models
Surendrabikram Thapa | Ashwarya Maratha | Khan Md Hasib | Mehwish Nasim | Usman Naseem
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Natural language processing has advanced with AI-driven language models (LMs), that are applied widely from text generation to question answering. These models are pre-trained on a wide spectrum of data sources, enhancing accuracy and responsiveness. However, this process inadvertently entails the absorption of a diverse spectrum of viewpoints inherent within the training data. Exploring political leaning within LMs due to such viewpoints remains a less-explored domain. In the context of a low-resource language like Bangla, this area of research is nearly non-existent. To bridge this gap, we comprehensively analyze biases present in Bangla language models, specifically focusing on social and economic dimensions. Our findings reveal the inclinations of various LMs, which will provide insights into ethical considerations and limitations associated with deploying Bangla LMs.

LowResourceNLU at BLP-2023 Task 1 & 2: Enhancing Sentiment Classification and Violence Incitement Detection in Bangla Through Aggregated Language Models
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Violence incitement detection and sentiment analysis hold significant importance in the field of natural language processing. However, in the case of the Bangla language, there are unique challenges due to its low-resource nature. In this paper, we address these challenges by presenting an innovative approach that leverages aggregated BERT models for two tasks at the BLP workshop in EMNLP 2023, specifically tailored for Bangla. Task 1 focuses on violence-inciting text detection, while task 2 centers on sentiment analysis. Our approach combines fine-tuning with textual entailment (utilizing BanglaBERT), Masked Language Model (MLM) training (making use of BanglaBERT), and the use of standalone Multilingual BERT. This comprehensive framework significantly enhances the accuracy of sentiment classification and violence incitement detection in Bangla text. Our method achieved the 11th rank in task 1 with an F1-score of 73.47 and the 4th rank in task 2 with an F1-score of 71.73. This paper provides a detailed system description along with an analysis of the impact of each component of our framework.

LowResContextQA at Qur’an QA 2023 Shared Task: Temporal and Sequential Representation Augmented Question Answering Span Detection in Arabic
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

The Qur’an holds immense theological and historical significance, and developing a technology-driven solution for answering questions from this sacred text is of paramount importance. This paper presents our approach to task B of Qur’an QA 2023, part of EMNLP 2023, addressing this challenge by proposing a robust method for extracting answers from Qur’anic passages. Leveraging the Qur’anic Reading Comprehension Dataset (QRCD) v1.2, we employ innovative techniques and advanced models to improve the precision and contextuality of answers derived from Qur’anic passages. Our methodology encompasses the utilization of start and end logits, Long Short-Term Memory (LSTM) networks, and fusion mechanisms, contributing to the ongoing dialogue at the intersection of technology and spirituality.

DialectNLU at NADI 2023 Shared Task: Transformer Based Multitask Approach Jointly Integrating Dialect and Machine Translation Tasks in Arabic
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

With approximately 400 million speakers worldwide, Arabic ranks as the fifth most-spoken language globally, necessitating advancements in natural language processing. This paper addresses this need by presenting a system description of the approaches employed for the subtasks outlined in the Nuanced Arabic Dialect Identification (NADI) task at EMNLP 2023. For the first subtask, involving closed country-level dialect identification classification, we employ an ensemble of two Arabic language models. Similarly, for the second subtask, focused on closed dialect to Modern Standard Arabic (MSA) machine translation, our approach combines sequence-to-sequence models, all trained on an Arabic-specific dataset. Our team ranks 10th and 3rd on subtask 1 and subtask 2 respectively.

KnowTellConvince at ArAIEval Shared Task: Disinformation and Persuasion Detection in Arabic using Similar and Contrastive Representation Alignment
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem
Proceedings of ArabicNLP 2023

In an era of widespread digital communication, the challenge of identifying and countering disinformation has become increasingly critical. However, compared to the solutions available in the English language, the resources and strategies for tackling this multifaceted problem in Arabic are relatively scarce. To address this issue, this paper presents our solutions to tasks in ArAIEval 2023. Task 1 focuses on detecting persuasion techniques, while Task 2 centers on disinformation detection within Arabic text. Leveraging a multi-head model architecture, fine-tuning techniques, sequential learning, and innovative activation functions, our contributions significantly enhance persuasion techniques and disinformation detection accuracy. Beyond improving performance, our work fills a critical research gap in content analysis for Arabic, empowering individuals, communities, and digital platforms to combat deceptive content effectively and preserve the credibility of information sources within the Arabic-speaking world.

2022

A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict
Surendrabikram Thapa | Aditya Shah | Farhan Jafri | Usman Naseem | Imran Razzak
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

This paper presents a new multi-modal dataset for identifying hateful content on social media, consisting of 5,680 text-image pairs collected from Twitter, labeled across two labels. Experimental analysis of the presented dataset has shown that understanding both modalities is essential for detecting these techniques. It is confirmed in our experiments with several state-of-the-art multi-modal models. In future work, we plan to extend the dataset in size. We further plan to develop new multi-modal models tailored explicitly to hate-speech detection, aiming for a deeper understanding of the text and image relation. It would also be interesting to perform experiments in a direction that explores what social entities the given hate speech tweet targets.

Co-authors

Surabhi Adhikari 10

Shuvam Shiwakoti 8

Mehwish Nasim 4

Siddhant Bikram Shah 4

Bal Krishna Bal 3

Ali Hürriyetoğlu 3

Kristina T. Johnson 3

Kengatharaiyer Sarveswaran 3

Francielle Vargas 3

Idris Abdulmumin 2

Cengiz Acarturk 2

Ibrahim Said Ahmad 2

Adem Chanie Ali 2

Abinew Ali Ayele 2

Ehsan Barkhordar 2

Chris Biemann 2

Tanmoy Chakraborty 2

Robert Geislinger 2

Aung Kyaw Htet 2

Dheeraj Kodati 2

Ashwarya Maratha 2

Sahar Moradizeyveh 2

Shamsuddeen Hassan Muhammad 2

Shantipriya Parida 2

Ihsan Ayyub Qazi 2

Martin Semmann 2

Clemencia Siro 2

Marco Antonio Stranisci 2

Fiona Anting Tan 2

Elena Tutubalina 2

Gökçe Uludoğan 2

Ashwini Vaidya 2

Lilian Diana Awuor Wanzare 2

Seid Muhie Yimam 2

Ameeta Agrawal 1

Syed Ishtiaque Ahmed 1

Emily Allaway 1

Richard He Bai 1

Aashish Bhandari 1

Alessandra Teresa Cignarella 1

Somaiyeh Dehghan 1

Daryna Dementieva 1

Simona Frenda 1

Rudy Garrido Veliz 1

Sophia Simeng Han 1

Md. Arid Hasan 1

Khan Md Hasib 1

Hansi Hettiarachchi 1

Ali Hürriyetoğlu 1

Farhan Ahmad Jafri 1

Satya Keerthi 1

Jane Wanjiru Kimani 1

Lakshmojee Koduru 1

Guneet Singh Kohli 1

Roy Ka-Wei Lee 1

Peerat Limkonchotiwat 1

Farhana Ferdousi Liza 1

Sujal Maharjan 1

Lesly Miculicich Werlen 1

Meryem M’hamdi 1

Nelson Odhiambo 1

Nelson Odhiambo Onyango 1

Nelleke Oostdijk 1

Junaid Rashid 1

Natarajan Balaji Shankar 1

Astha Shrestha 1

Andrew J Smart 1

Santosh T.Y.S.S. 1

Jackson Trager 1

Rudy Alexandro Garrido Veliz 1

Erdem Yörük 1

MD Arfeen Zeeshan 1

Venues