Does Chain-of-Thought Reasoning Help Mobile GUI Agents? An Empirical Study

Li Zhang; Longxi Gao; Mengwei Xu

Does Chain-of-Thought Reasoning Help Mobile GUI Agents? An Empirical Study

Abstract

Reasoning capabilities have significantly improved the performance of vision-language models (VLMs) in domains such as mathematical problem-solving, coding, and visual question-answering. However, their impact on real-world applications remains unclear. This paper presents a large-scale empirical study on the effectiveness of reasoning-enabled VLMs in mobile GUI agents. We evaluate six pairs of VLMs, including both commercial and open-source lightweight models, by comparing their base and reasoning-enhanced versions across static and interactive benchmarks. Our findings show that reasoning-enabled VLMs generally provide only marginal improvements over their non-reasoning counterparts and can even degrade performance in certain agent configurations. Notably, reasoning and non-reasoning VLMs fail on different sets of tasks, suggesting that reasoning does have an impact, but its benefits and drawbacks counterbalance each other. We attribute these inconsistencies to the limitations of benchmarks and VLMs. Based on the findings, we provide insights for further enhancing mobile GUI agents in terms of benchmarks, VLMs, and their adaptability in dynamically invoking reasoning VLMs.

Anthology ID:: 2026.findings-acl.392
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7981–7996
Language:
URL:: https://aclanthology.org/2026.findings-acl.392/
DOI:
Bibkey:
Cite (ACL):: Li Zhang, Longxi Gao, and Mengwei Xu. 2026. Does Chain-of-Thought Reasoning Help Mobile GUI Agents? An Empirical Study. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7981–7996, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Does Chain-of-Thought Reasoning Help Mobile GUI Agents? An Empirical Study (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.392.pdf
Checklist:: 2026.findings-acl.392.checklist.pdf

PDF Cite Search Checklist Fix data