ProMCP: Profiling Token Flows and Latency Costs in Model Context Protocol–Based LLM Agents

Sumera Anjum; Weijian Zheng; Rajkumar Kettimuthu; Heng Fan; Yunhe Feng

doi:10.18653/v1/2026.findings-acl.1967

ProMCP: Profiling Token Flows and Latency Costs in Model Context Protocol–Based LLM Agents

Sumera Anjum, Weijian Zheng, Rajkumar Kettimuthu, Heng Fan, Yunhe Feng

Abstract

The Model Context Protocol (MCP) aims to standardize the integration of Large Language Models (LLMs) with external tools, yet existing research primarily evaluates functional capabilities while treating the underlying protocol as an opaque black box. This oversight obscures critical inefficiencies in token flows and latency distributed across MCP’s decoupled Host-Client-Server architecture. In this paper, we introduce ProMCP, an end-to-end profiling and instrumentation framework that decomposes the MCP workflow into a six-stage communication pipeline, enabling granular attribution of computational costs. We evaluate widely varying deployment topologies—from air-gapped local models to commercial off-the-shelf (OTS) clients—across 20 servers and 169 tools from MCP-Bench and MCP-Universe. Our analysis reveals a distinct inversion in performance bottlenecks: topologies with customized clients devote 56–72% of total tokens and 60–67% of latency to planning and schema injection, whereas OTS clients concentrate over 85% of latency in final answer synthesis. Crucially, actual tool execution constitutes a negligible fraction of the total cost across all configurations. These findings establish a quantitative baseline for protocol overhead and demonstrate that future optimization must target schema orchestration and transport efficiency rather than tool execution speed. The code is available at: https://github.com/ResponsibleAILab/ProMCP.

Anthology ID:: 2026.findings-acl.1967
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39476–39487
Language:
URL:: https://aclanthology.org/2026.findings-acl.1967/
DOI:: 10.18653/v1/2026.findings-acl.1967
Bibkey:
Cite (ACL):: Sumera Anjum, Weijian Zheng, Rajkumar Kettimuthu, Heng Fan, and Yunhe Feng. 2026. ProMCP: Profiling Token Flows and Latency Costs in Model Context Protocol–Based LLM Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39476–39487, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ProMCP: Profiling Token Flows and Latency Costs in Model Context Protocol–Based LLM Agents (Anjum et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1967.pdf
Checklist:: 2026.findings-acl.1967.checklist.pdf

PDF Cite Search Checklist Fix data