When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems

Xin Yang; Junhao Wang; Bintao Tang; Xuxin Cheng; Cao Liu; Ke Zeng; Wenyuan Jiang

When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems

Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, Wenyuan Jiang

Abstract

Current LLM-based multi-agent systems remain fragile under scaling, even on algorithmically trivial tasks. We introduce MAS-BENCH, a distributed-sorting benchmark that isolates coordination under explicit communication constraints: each agent observes only a local segment and must collectively produce a globally consistent order via broadcasting, peer-to-peer messaging, or a shared key-value store. Across LLM-based agents, success drops sharply as the number of agents grows, exposing persistent failures in shared state, convention alignment, and consistent termination. To mitigate these breakdowns, we propose CAMOC, a lightweight, drop-in proof-of-concept built on collaboration-aware information sharing, early global metadata exchange, and single-commit verification. CAMOC substantially improves coordination success and efficiency across backends, with the largest gains under shared-state interaction. Overall, MAS-BENCH provides a diagnostic benchmark and CAMOC offers a practical step toward more reliable large-scale LLM collaboration, highlighting a gap between individual reasoning and collective correctness.

Anthology ID:: 2026.findings-acl.1698
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34002–34021
Language:
URL:: https://aclanthology.org/2026.findings-acl.1698/
DOI:
Bibkey:
Cite (ACL):: Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, and Wenyuan Jiang. 2026. When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34002–34021, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems (Yang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1698.pdf
Checklist:: 2026.findings-acl.1698.checklist.pdf

PDF Cite Search Checklist Fix data