MD Sadik Hossain Shanto

2026

The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMs
Asif Azad | Mohammad Sadat Hossain | MD Sadik Hossain Shanto | M Saifur Rahman | Md Rizwan Parvez
Findings of the Association for Computational Linguistics: EACL 2026

Vision-Language Models (VLMs) have achieved remarkable progress in complex visual understanding across scientific and reasoning tasks. While performance benchmarking has advanced our understanding of these capabilities, the critical dimension of uncertainty quantification has received insufficient attention. Therefore, unlike prior conformal prediction studies that focused on limited settings, we conduct a comprehensive uncertainty benchmarking study, evaluating 18 state-of-the-art VLMs (open and closed-source) across 6 multimodal datasets with 3 distinct scoring functions. For closed-source models lacking token-level logprob access, we develop and validate instruction-guided likelihood proxies. Our findings demonstrate that larger models consistently exhibit better uncertainty quantification; models that know more also know better what they don’t know. More certain models achieve higher accuracy, while mathematical and reasoning tasks elicit poorer uncertainty performance across all models compared to other domains. This work establishes a foundation for reliable uncertainty evaluation in multimodal systems.

2025

pdf bib abs

SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
Bijoy Ahmed Saiem | MD Sadik Hossain Shanto | Rakib Ahsan | Md Rafi Ur Rashid
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

As the use of Large Language Models (LLMs) expands, so do concerns about their vulnerability to jailbreak attacks. We introduce SequentialBreak, a novel single-query jailbreak technique that arranges multiple benign prompts in sequence with a hidden malicious instruction among them to bypass safety mechanisms. Sequential prompt chains in a single query can lead LLMs to focus on certain prompts while ignoring others. By embedding a malicious prompt within a prompt chain, we show that LLMs tend to ignore the harmful context and respond to all prompts including the harmful one. We demonstrate the effectiveness of our attack across diverse scenarios—including Q&A systems, dialogue completion tasks, and levelwise gaming scenario—highlighting its adaptability to varied prompt structures. The variability of prompt structures shows that SequentialBreak is adaptable to formats beyond those discussed here. Experiments show that SequentialBreak only uses a single query to significantly outperform existing baselines on both open-source and closed-source models. These findings underline the urgent need for more robust defenses against prompt-based attacks. The Results and website are available on https://anonymous.4open.science/r/JailBreakAttack-4F3B/.

Co-authors

Md Rafi Ur Rashid 1

Bijoy Ahmed Saiem 1

Venues

Fix author