2025
pdf
bib
abs
BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects
Jakir Hasan
|
Shubhashis Roy Dipta
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Real-time speech assistants are becoming increasingly popular for ensuring improved accessibility to information. Bengali, being a low-resource language with a high regional dialectal diversity, has seen limited progress in developing such systems. Existing systems are not optimized for real-time use and focus only on standard Bengali. In this work, we present BanglaTalk, the first real-time speech assistance system for Bengali regional dialects. BanglaTalk follows the client-server architecture and uses the Real-time Transport Protocol (RTP) to ensure low-latency communication. To address dialectal variation, we introduce a dialect-aware ASR system, BRDialect, developed by fine-tuning the IndicWav2Vec model in ten Bengali regional dialects. It outperforms the baseline ASR models by 12.41-33.98% on the RegSpeech12 dataset. Furthermore, BanglaTalk can operate at a low bandwidth of 24 kbps while maintaining an average end-to-end delay of 4.9 seconds. Low bandwidth usage and minimal end-to-end delay make the system both cost-effective and interactive for real-time use cases, enabling inclusive and accessible speech technology for the diverse community of Bengali speakers.
pdf
bib
abs
Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles
Nusrat Jahan Lia
|
Shubhashis Roy Dipta
|
Abdullah Khan Zehady
|
Naymul Islam
|
Madhusodan Chakraborty
|
Abdullah Al Wasif
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Detecting media bias is crucial, specifically in the South Asian region. Despite this, annotated datasets and computational studies for Bangla political bias research remain scarce. Crucially because, political stance detection in Bangla news requires understanding of linguistic cues, cultural context, subtle biases, rhetorical strategies, code-switching, implicit sentiment, and socio-political background. To address this, we introduce the first benchmark dataset of 200 politically significant and highly debated Bangla news articles, labeled for government-leaning, government-critique, and neutral stances, alongside diagnostic analyses for evaluating large language models (LLMs). Our comprehensive evaluation of 28 proprietary and open-source LLMs shows strong performance in detecting government-critique content (F1 up to 0.83) but substantial difficulty with neutral articles (F1 as low as 0.00). Models also tend to over-predict government-leaning stances, often misinterpreting ambiguous narratives. This dataset and its associated diagnostics provide a foundation for advancing stance detection in Bangla media research and offer insights for improving LLM performance in low-resource languages.
pdf
bib
abs
PromptGuard at BLP-2025 Task 1: A Few-Shot Classification Framework Using Majority Voting and Keyword Similarity for Bengali Hate Speech Detection
Rakib Hossan
|
Shubhashis Roy Dipta
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
The BLP-2025 Task 1A requires Bengali hate speech classification into six categories. Traditional supervised approaches need extensive labeled datasets that are expensive for low-resource languages. We developed PromptGuard, a few-shot framework combining chi-square statistical analysis for keyword extraction with adaptive majority voting for decision-making. We explore statistical keyword selection versus random approaches and adaptive voting mechanisms that extend classification based on consensus quality. Chi-square keywords provide consistent improvements across categories, while adaptive voting benefits ambiguous cases requiring extended classification rounds. PromptGuard achieves a micro-F1 of 67.61, outperforming n-gram baselines (60.75) and random approaches (14.65). Ablation studies confirm chi-square–based keywords show the most consistent impact across all categories.
pdf
bib
abs
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
Shubhashis Roy Dipta
|
Francis Ferraro
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Recent approaches have shown impressive proficiency in extracting and leveraging parametric knowledge from Large-Language Models (LLMs) and Vision-Language Models (VLMs). In this work, we consider how we can improve the retrieval of videos related to complex real-world events by automatically extracting latent parametric knowledge about those events. We present Q2E: a Query-to-Event decomposition method for zero-shot multilingual text-to-video retrieval, adaptable across datasets, domains, LLMs, or VLMs. Our approach demonstrates that we can enhance the understanding of otherwise overly simplified human queries by decomposing the query using the knowledge embedded in LLMs and VLMs. We additionally show how to apply our approach to both visual and speech-based inputs. To combine this varied multimodal knowledge, we adopt entropy-based fusion scoring for zero-shot fusion. Q2E outperforms the previous SOTA on the MultiVENT dataset by 8 NDCG points, while improving on MSR-VTT and MSVD by 4 and 3 points, respectively, outperforming several existing retrieval methods, including many fine-tuned and SOTA zero-shot approaches. We have released both code and data.
pdf
bib
abs
If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition
Shubhashis Roy Dipta
|
Francis Ferraro
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Prior work has shown that presupposition in generated questions can introduce unverified assumptions, leading to inconsistencies in claim verification. Additionally, prompt sensitivity remains a significant challenge for large language models (LLMs), resulting in performance variance as high as **3–6%**. While recent advancements have reduced this gap, our study demonstrates that prompt sensitivity remains a persistent issue. To address this, we propose a structured and robust claim verification framework that reasons through presupposition-free, decomposed questions. Extensive experiments across multiple prompts, datasets, and LLMs reveal that even state-of-the-art models remain susceptible to prompt variance and presupposition. Our method consistently mitigates these issues, achieving up to a **2–5%** improvement.
2024
pdf
bib
abs
HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?
Shubhashis Roy Dipta
|
Sadat Shahriar
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper describes our system developed for SemEval-2024 Task 8, “Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection.” Machine-generated texts have been one of the main concerns due to the use of large language models (LLM) in fake text generation, phishing, cheating in exams, or even plagiarizing copyright materials. A lot of systems have been developed to detect machine-generated text. Nonetheless, the majority of these systems rely on the text-generating model. This limitation is impractical in real-world scenarios, as it’s often impossible to know which specific model the user has used for text generation. In this work, we propose a single model based on contrastive learning, which uses ~40% of the baseline’s parameters (149M vs. 355M) but shows a comparable performance on the test dataset (21st out of 137 participants). Our key finding is that even without an ensemble of multiple models, a single base model can have comparable performance with the help of data augmentation and contrastive learning.
pdf
bib
abs
UMBCLU at SemEval-2024 Task 1: Semantic Textual Relatedness with and without machine translation
Shubhashis Roy Dipta
|
Sai Vallurupalli
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The aim of SemEval-2024 Task 1, “Semantic Textual Relatedness for African and Asian Languages” is to develop models for identifying semantic textual relatedness (STR) between two sentences using multiple languages (14 African and Asian languages) and settings (supervised, unsupervised, and cross-lingual). Large language models (LLMs) have shown impressive performance on several natural language understanding tasks such as multilingual machine translation (MMT), semantic similarity (STS), and encoding sentence embeddings. Using a combination of LLMs that perform well on these tasks, we developed two STR models, TranSem and FineSem, for the supervised and cross-lingual settings. We explore the effectiveness of several training methods and the usefulness of machine translation. We find that direct fine-tuning on the task is comparable to using sentence embeddings and translating to English leads to better performance for some languages. In the supervised setting, our model performance is better than the official baseline for 3 languages with the remaining 4 performing on par. In the cross-lingual setting, our model performance is better than the baseline for 3 languages (leading to 1st place for Africaans and 2nd place for Indonesian), is on par for 2 languages and performs poorly on the remaining 7 languages.
2023
pdf
bib
abs
Semantically-informed Hierarchical Event Modeling
Shubhashis Roy Dipta
|
Mehdi Rezaee
|
Francis Ferraro
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)
Prior work has shown that coupling sequential latent variable models with semantic ontological knowledge can improve the representational capabilities of event modeling approaches. In this work, we present a novel, doubly hierarchical, semi-supervised event modeling framework that provides structural hierarchy while also accounting for ontological hierarchy. Our approach consistsof multiple layers of structured latent variables, where each successive layer compresses and abstracts the previous layers. We guide this compression through the injection of structured ontological knowledge that is defined at the type level of events: importantly, our model allows for partial injection of semantic knowledge and it does not depend on observing instances at any particular level of the semantic ontology. Across two different datasets and four different evaluation metrics, we demonstrate that our approach is able to out-perform the previous state-of-the-art approaches by up to 8.5%, demonstrating the benefits of structured and semantic hierarchical knowledge for event modeling.