Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Aditya Bharat Soni; Boxuan Li; Xingyao Wang; Valerie Chen; Graham Neubig

Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, Graham Neubig

Abstract

Modern human labor is characterized by specialization; we train for years and develop particular tools that allow us to perform well across a variety of tasks. Similarly, specialized AI agents with task-specific tools or architectures often fail to generalize beyond their intended scope. In this work, we ask: *can agents achieve generalizability across diverse domains with a small, but well-chosen set of general tools?* We propose OpenHands-Versa, a single-agent system with a modest number of general tools like code execution, search engine, web browser and multimodal file viewer, for three practical domains: software engineering, deep research, and web browsing. Notably, OpenHands-Versa demonstrates superior or competitive performance over task-specific specialized agents on three challenging benchmarks: SWE-Bench Multimodal, GAIA, and The Agent Company, with absolute improvements in success rate of **9.1**, **1.3**, and **9.1** points, respectively. Thus, our *single-agent* system can achieve strong generalization indicating that specialist agents for these domains provide no practical benefit. Furthermore, we find that specialist multi-agent systems do not generalize beyond their intended scope. These findings establish OpenHands-Versa as a strong baseline for future research.

Anthology ID:: 2026.findings-eacl.318
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6052–6069
Language:
URL:: https://aclanthology.org/2026.findings-eacl.318/
DOI:
Bibkey:
Cite (ACL):: Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, and Graham Neubig. 2026. Coding Agents with Multimodal Browsing are Generalist Problem Solvers. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6052–6069, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Coding Agents with Multimodal Browsing are Generalist Problem Solvers (Soni et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.318.pdf
Checklist:: 2026.findings-eacl.318.checklist.pdf

PDF Cite Search Checklist Fix data