Tanmana Sadhu

2025

Open-world planning with incomplete knowledge is crucial for real-world embodied AI tasks. Despite that, existing LLM-based planners struggle with long chains of sequential reasoning, while symbolic planners face combinatorial explosion of states and actions for complex domains due to reliance on grounding. To address these deficiencies, we introduce LLM-Regress, an open-world planning approach integrating lifted regression with LLM-generated affordances. LLM-Regress generates sound and complete plans in a compact lifted form, avoiding exhaustive enumeration of irrelevant states and actions. Additionally, it makes efficient use of LLMs to infer goal-related objects and affordances without the need to predefine all possible objects and affordances. We conduct extensive experiments on three benchmarks and show that LLM-Regress significantly outperforms state-of-the-art LLM planners and a grounded planner using LLM-generated affordances.

pdf bib abs

VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings
Tanmana Sadhu | Yanan Chen | Ali Pesaranghader
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large language models (LLMs) are applied to reasoning and (automated) planning across diverse domains, from travel itineraries to embodied AI tasks. However, concerns have been raised about their suitability for long-horizon tasks involving multiple constraints, as they are prone to hallucinations, particularly in adversarial scenarios. Safety reasoning also becomes critical for embodied AI agents, which interact with their physical environments to complete tasks on behalf of humans. However, existing (safety) benchmarks fail to represent a diverse range of multi-constraint tasks that require long-horizon planning with a focus on safety. To address this, we propose VESTABENCH, a benchmark curated using VirtualHome and BEHAVIOR-100. Our VESTABENCH includes (1) tasks that can be achieved safely under adversarial and multi-constraint settings, as well as (2) adversarial instructions that the agent must avoid. Our experiments with state-of-the-art LLM-based baselines reveal that they perform poorly against our tasks, not only achieving low success rates but also suffering significantly compromised safety outcomes. This observation reinforces the limitations of LLMs in generating safe plans when faced with adversarial settings or instructions. Finally, we believe that our findings benefit the research and industry communities.

2024

pdf bib abs

Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Tanmana Sadhu | Ali Pesaranghader | Yanan Chen | Dong Hoon Yi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

2022

pdf bib abs

Exploring Compositional Image Retrieval with Hybrid Compositional Learning and Heuristic Negative Mining
Chao Wang | Ehsan Nezhadarya | Tanmana Sadhu | Shengdong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Compositional image retrieval (CIR) is a challenging retrieval task, where the query is composed of a reference image and a modification text, and the target is another image reflecting the modification to the reference image. Due to the great success of the pre-trained vision-and-language model CLIP and its favorable applicability to large-scale retrieval tasks, we propose a CIR model HyCoLe-HNM with CLIP as the backbone. In HyCoLe-HNM, we follow the contrastive pre-training method of CLIP to perform cross-modal representation learning. On this basis, we propose a hybrid compositional learning mechanism, which includes both image compositional learning and text compositional learning. In hybrid compositional learning, we borrow a gated fusion mechanism from a question answering model to perform compositional fusion, and propose a heuristic negative mining method to filter negative samples. Privileged information in the form of image-related texts is utilized in cross-modal representation learning and hybrid compositional learning. Experimental results show that HyCoLe-HNM achieves state-of-the-art performance on three CIR datasets, namely FashionIQ, Fashion200K, and MIT-States.

Co-authors

Xiaotian Liu 1

Ehsan Nezhadarya 1

Scott Sanner 1

Punyaphat Sukcharoenchaikul 1

Chao Wang 1

Dong Hoon Yi 1

Shengdong Zhang 1

Venues

Fix author