Heng Tao Shen
2026
Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models
Xudong Wang | Chaoning Zhang | Chenghao Li | Shuxu Chen | Qigan Sun | Jiaquan Zhang | Fachrina Dewi Puspitasari | Tae-Ho Kim | Jiwei Wei | Malu Zhang | Guoqing Wang | Yang Yang | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Xudong Wang | Chaoning Zhang | Chenghao Li | Shuxu Chen | Qigan Sun | Jiaquan Zhang | Fachrina Dewi Puspitasari | Tae-Ho Kim | Jiwei Wei | Malu Zhang | Guoqing Wang | Yang Yang | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited transferability. Existing automatic prompt optimization methods typically adopt single-agent local search, failing to simultaneously optimize prompts and decoding hyperparameters within a unified framework to achieve stable global improvements. To address this limitation, we propose Agent-GWO, a dynamic prompt optimization framework for complex reasoning. Specifically, we unify prompt templates and decoding hyperparameters as inheritable agent configurations. By leveraging the leader-follower mechanism of the Grey Wolf Optimizer (GWO), we automatically select three leader agents (𝛼, 𝛽, and 𝛿) to guide the collaborative updates of the remaining agents, enabling iterative convergence toward robust optimal reasoning configurations that can be seamlessly integrated for inference. Extensive experiments on multiple mathematical and hybrid reasoning benchmarks across diverse LLM backbones show that Agent-GWO consistently improves accuracy and stability over existing prompt optimization methods.
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering
Wencheng Ye | Xiaoyang Yuan | Yi Bin | Hengyu Jin | Liang Peng | Pengpeng Zeng | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Wencheng Ye | Xiaoyang Yuan | Yi Bin | Hengyu Jin | Liang Peng | Pengpeng Zeng | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Recent work on domain-specific reasoning with large language models (LLMs) has largely relied on training-intensive approaches that require updating model parameters. Although activation steering has emerged as a parameter-efficient alternative, existing methods typically rely on static and manually designed interventions, limiting their ability to adapt to the dynamic nature of complex reasoning. To address this limitation, we propose RISER (Router-based Intervention for Steerable Enhancement of Reasoning), a plug-and-play intervention framework that adaptively steers LLM reasoning in activation space. RISER builds a library of reusable reasoning vectors and employs a lightweight Router to dynamically compose these vectors for each input. The Router is optimized via reinforcement learning under task-level rewards, enabling the emergent and compositional activation of latent cognitive primitives. Across seven diverse benchmarks, RISER achieves average zero-shot accuracy improvements of 3.4–6.5% over the base model, while outperforming chain-of-thought-style reasoning with 2–3× higher token efficiency and robust accuracy gains. Further analysis demonstrates that RISER autonomously combines multiple vectors into interpretable and precise control strategies, pointing toward more controllable and efficient LLM reasoning.
2025
From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs
Shenshen Li | Wenxin Meng | Lei Wang | Hao Yang | Chong Peng | Peng Yan | Fumin Shen | Jingkuan Song | Heng Tao Shen | Xing Xu
Findings of the Association for Computational Linguistics: ACL 2025
Shenshen Li | Wenxin Meng | Lei Wang | Hao Yang | Chong Peng | Peng Yan | Fumin Shen | Jingkuan Song | Heng Tao Shen | Xing Xu
Findings of the Association for Computational Linguistics: ACL 2025
Recent progress in large vision-language models (LVLMs) has shown substantial potential across a broad spectrum of third-person tasks. However, adapting these LVLMs to egocentric scenarios remains challenging due to their third-person training bias. Existing methods that adapt LVLMs for first-person tasks often overlook critical agent-environment interactions, limiting their ability to perform egocentric reasoning. To address these challenges, we propose a novel zero-shot paradigm termed Front-Door Adjustments with Uncertainty Calibration (FRUIT) to enhance the egocentric reasoning abilities of LVLMs by simulating human causal reasoning. Specifically, the FRUIT operates in two stages: observation and understanding. Unlike conventional prompting techniques, we formalize egocentric reasoning using a structural causal model. Then, we ground interaction regions and expand them into hierarchical visual cues, augmented with corresponding captions, to form the initial observations. To reduce noise in these observations, we employ uncertainty calibration to filter out unreliable information. These refined observations as mediators are then incorporated into the prompt template, guiding the model to understand semantics from a first-person perspective. Extensive experiments conducted on the EgoThink benchmark demonstrate that our FRUIT method consistently enhances the performance of existing LVLMs on six distinct tasks. Our code is available at https://github.com/Mrshenshen/FRUIT.
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Run Luo | Haonan Zhang | Longze Chen | Ting-En Lin | Xiong Liu | Yuchuan Wu | Min Yang | Yongbin Li | Minzheng Wang | Pengpeng Zeng | Lianli Gao | Heng Tao Shen | Yunshui Li | Hamid Alinejad-Rokny | Xiaobo Xia | Jingkuan Song | Fei Huang
Findings of the Association for Computational Linguistics: ACL 2025
Run Luo | Haonan Zhang | Longze Chen | Ting-En Lin | Xiong Liu | Yuchuan Wu | Min Yang | Yongbin Li | Minzheng Wang | Pengpeng Zeng | Lianli Gao | Heng Tao Shen | Yunshui Li | Hamid Alinejad-Rokny | Xiaobo Xia | Jingkuan Song | Fei Huang
Findings of the Association for Computational Linguistics: ACL 2025
The development of Multimodal Large Language Models (MLLMs) has seen significant progress, driven by increasing demands across various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches aim to enhance MLLM capabilities through diverse architectures, their performance gains have become increasingly marginal. In contrast, data-driven methods, which scale up image-text instruction datasets, have proven more effective but face challenges related to limited data diversity and complexity. The absence of high-quality instruction data remains a major bottleneck in MLLM development. To address this issue, we propose , a novel multimodal instruction data evolution framework. This framework iteratively enhances data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution, generating a more complex and diverse image-text instruction dataset that significantly improves MLLM capabilities. Starting with an initial dataset, SEED-163K, we employ to systematically expand instruction diversity, extend visual reasoning steps to improve cognitive abilities, and extract fine-grained visual details to enhance understanding and robustness. To rigorously evaluate our approach, we conduct extensive qualitative analysis and quantitative experiments across 13 vision-language tasks. Compared to baseline models trained on the original seed dataset, our method achieves an average accuracy improvement of 3.1 percentage points. Moreover, our approach attains state-of-the-art (SOTA) performance in nine tasks while using significantly less data than existing state-of-the-art models.
2024
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
Zhiyuan Wang | Jinhao Duan | Lu Cheng | Yue Zhang | Qingni Wang | Xiaoshuang Shi | Kaidi Xu | Heng Tao Shen | Xiaofeng Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Zhiyuan Wang | Jinhao Duan | Lu Cheng | Yue Zhang | Qingni Wang | Xiaoshuang Shi | Kaidi Xu | Heng Tao Shen | Xiaofeng Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the closed-source nature of the latest large language models (LLMs). This study investigates applying conformal prediction (CP), which can transform any heuristic uncertainty notion into rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We introduce a novel uncertainty measure based on self-consistency theory, and then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods. Furthermore, we achieve strict control over the correctness coverage rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning general-purpose and medical scenarios. Additionally, the calibrated prediction sets with small size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
Search
Fix author
Co-authors
- Jingkuan Song 2
- Pengpeng Zeng 2
- Hamid Alinejad-Rokny 1
- Yi Bin 1
- Longze Chen 1
- Shuxu Chen 1
- Lu Cheng 1
- Jinhao Duan 1
- Lianli Gao 1
- Fei Huang 1
- Hengyu Jin 1
- Tae-Ho Kim 1
- Chenghao Li 1
- Shenshen Li 1
- Yongbin Li 1
- Yunshui Li 1
- Ting-En Lin 1
- Xiong Liu 1
- Run Luo 1
- Wenxin Meng 1
- Chong Peng 1
- Liang Peng 1
- Fachrina Dewi Puspitasari 1
- Fumin Shen 1
- Xiaoshuang Shi 1
- Qigan Sun 1
- Guoqing Wang 1
- Lei Wang 1
- Minzheng Wang 1
- Qingni Wang 1
- Xudong Wang 1
- Zhiyuan Wang 1
- Jiwei Wei 1
- Yuchuan Wu 1
- Xiaobo Xia 1
- Kaidi Xu 1
- Xing Xu 1
- Peng Yan 1
- Hao Yang 1
- Min Yang 1
- Yang Yang 1
- Wencheng Ye 1
- Xiaoyang Yuan 1
- Chaoning Zhang 1
- Haonan Zhang 1
- Jiaquan Zhang 1
- Malu Zhang 1
- Yue Zhang 1
- Xiaofeng Zhu 1