Amrit Singh Bedi
2026
Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs
James Beetham | Souradip Chakraborty | Mengdi Wang | Furong Huang | Amrit Singh Bedi | Mubarak Shah
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
James Beetham | Souradip Chakraborty | Mengdi Wang | Furong Huang | Amrit Singh Bedi | Mubarak Shah
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) are safety-aligned to prevent harmful response generation, yet still remain vulnerable to jailbreak attacks. While prior works have focused on improving jailbreak attack effectiveness, they offer little explanation for why safety alignment fails. We address this gap by framing jailbreaks as inference-time alignment, connecting attack design and safety alignment within a unified optimization framework. This framing allows us to extend best-of-N inference-time alignment to the adversarial setting, called LIAR (Leveraging Inference-time Alignment to jailbReak), and derive suboptimality bounds that show LIAR provably approaches an optimal jailbreak as compute scales. Interestingly, our framework allows us to develop the notion of a Safety-Net, a measure of how vulnerable an LLM is to jailbreaks, which helps to explain why safety alignment can fail. Empirically, LIAR produces natural, hard-to-detect prompts that achieve a competitive attack success rate while running 10 to 100x faster than prior suffix-based jailbreaks.
2025
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
Aakriti Agrawal | Rohith Aralikatti | Anirudh Satheesh | Souradip Chakraborty | Amrit Singh Bedi | Furong Huang
Findings of the Association for Computational Linguistics: EMNLP 2025
Aakriti Agrawal | Rohith Aralikatti | Anirudh Satheesh | Souradip Chakraborty | Amrit Singh Bedi | Furong Huang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) have demonstrated exceptional capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge, particularly in resource-constrained settings. Existing approaches often depend on costly external verifiers, human evaluators, or self-consistency techniques that require multiple samples from a single model. While multi-LLM systems produce more diverse responses than single models and thus have greater potential, they often underperform compared to single LLM self-consistency. In this work, we propose a calibrated log-likelihood-based selection framework to improve multi-LLM performance. Our approach leverages uncertainty estimation to identify the most confident response while minimizing inference costs. We show that our method outperforms majority voting and exceeds self-consistency performance when using a large number of model calls. Through extensive experiments, we demonstrate improvements of approx. 4%, 3%, and 5% on GSM8K, MMLU, and ARC, respectively, when applying uncertainty-aware selection to multi-LLM systems.
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Ankita Shukla | Sandeep Kumar | Amrit Singh Bedi | Tanmoy Chakraborty
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Ankita Shukla | Sandeep Kumar | Amrit Singh Bedi | Tanmoy Chakraborty
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Findings of the MMLoSo 2025 Shared Task on Machine Translation into Tribal Languages
Pooja Singh | Sandeep Chatterjee | Gullal S. Cheema | Amrit Singh Bedi | Tanmoy Chakraborty | Sandeep Kumar | Ankita Shukla
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Pooja Singh | Sandeep Chatterjee | Gullal S. Cheema | Amrit Singh Bedi | Tanmoy Chakraborty | Sandeep Kumar | Ankita Shukla
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
This paper presents the findings of the MMLoSo Shared Task on Machine Translation. The competition features four tribal languages from India: Bhili, Mundari, Gondi, and Santali, each with 20,000 high-quality parallel sentence pairs and a 16,000-sentence evaluation set. A total of 18 teams submitted across all language pairs. The shared task addresses the challenges of translating India’s severely low-resource tribal languages, which, despite having millions of speakers, remain digitally marginalized due to limited textual resources, diverse scripts, rich morphology, and minimal publicly available parallel corpora. Systems were ranked using a weighted composite score combining BLEU (60%) and chrF (40%) to balance structural accuracy and character-level fluency. The best-performing system leveraged IndicTrans2 with directional LoRA adapters and reverse-model reranking. This work establishes the first reproducible benchmark for machine translation in these tribal languages. All datasets, baseline models, and system outputs are publicly released to support continued research in India’s tribal language technologies.