On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Changyu Liu; Yiyang Liu; Taowen Wang; Qiao Zhuang; James Chenhao Liang; Wenhao Yang; Renjing Xu; Qifan Wang; Dongfang Liu; Cheng Han

On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Changyu Liu, Yiyang Liu, Taowen Wang, Qiao Zhuang, James Chenhao Liang, Wenhao Yang, Renjing Xu, Qifan Wang, Dongfang Liu, Cheng Han

Abstract

Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular, they are primarily trained via supervised fine-tuning or training-time reinforcement learning, requiring explicit fine-tuning phases, human interventions, or controlled data collection. Consequently, existing methods remain unsuitable for challenging simulated- or physical-world deployments, where robots must respond autonomously and flexibly to evolving environments. To address this limitation, we introduce a Test-Time Reinforcement Learning for VLAs (TT-VLA), a framework that enables on-the-fly policy adaptation during inference. TT-VLA formulates a dense reward mechanism that leverages step-by-step task-progress signals to refine action policies during test time while preserving the SFT/RL-trained priors, making it an effective supplement to current VLA models. Empirical results show that our approach enhances overall adaptability, stability, and task success in dynamic, previously unseen scenarios under simulated and real-world settings. We believe TT-VLA offers a principled step toward self-improving, deployment-ready VLAs.

Anthology ID:: 2026.acl-long.1863
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40107–40125
Language:
URL:: https://aclanthology.org/2026.acl-long.1863/
DOI:
Bibkey:
Cite (ACL):: Changyu Liu, Yiyang Liu, Taowen Wang, Qiao Zhuang, James Chenhao Liang, Wenhao Yang, Renjing Xu, Qifan Wang, Dongfang Liu, and Cheng Han. 2026. On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 40107–40125, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning (Liu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1863.pdf
Checklist:: 2026.acl-long.1863.checklist.pdf

PDF Cite Search Checklist Fix data