SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Shengyue Guan; Yihao Liu; Lang Cao

SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Abstract

Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we introduce SupChain-Bench, a unified real-world benchmark that assesses both supply chain domain knowledge and long-horizon tool-based orchestration grounded in standard operating procedures (SOPs). Our experiments reveal substantial gaps in execution reliability across models. We further propose SupChain-ReAct, an SOP-free framework that autonomously synthesizes executable procedures for tool use, achieving the strongest and most consistent tool-calling performance. Our work establishes a principled benchmark for studying reliable long-horizon orchestration in real-world operational settings and highlights significant room for improvement in LLM-based supply chain agents.

Anthology ID:: 2026.findings-acl.371
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7526–7550
Language:
URL:: https://aclanthology.org/2026.findings-acl.371/
DOI:
Bibkey:
Cite (ACL):: Shengyue Guan, Yihao Liu, and Lang Cao. 2026. SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7526–7550, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management (Guan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.371.pdf
Checklist:: 2026.findings-acl.371.checklist.pdf

PDF Cite Search Checklist Fix data