MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference

Jeonghyun Park; Ingeol Baek; Seunghyun Yoon; Haeun Jang; Aparna Garimella; Akriti Jain; Nedim Lipka; Hwanhee Lee

MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference

Jeonghyun Park, Ingeol Baek, Seunghyun Yoon, Haeun Jang, Aparna Garimella, Akriti Jain, Nedim Lipka, Hwanhee Lee

Abstract

Real-world multi-hop QA is naturally linked with ambiguity, where a single query can trigger multiple reasoning paths that require independent resolution. Since ambiguity can occur at any stage, models must navigate layered uncertainty throughout the entire reasoning chain. Despite its prevalence in real-world user queries, previous benchmarks have primarily focused on single-hop ambiguity, leaving the complex interaction between multi-step inference and layered ambiguity underexplored. In this paper, we introduce MARCH, a benchmark for their intersection, with 2,209 multi-hop ambiguous questions curated via multi-LLM verification and validated by human annotation with strong agreement. Our experiments reveal that even state-of-the-art models struggle with MARCH, confirming that combining ambiguity resolution with multi-step reasoning is a significant challenge. To address this, we propose CLARION, a two-stage agentic framework that explicitly decouples ambiguity planning from evidence-driven reasoning, significantly outperforms existing approaches, and paves the way for robust reasoning systems.

Anthology ID:: 2026.findings-acl.1352
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27089–27115
Language:
URL:: https://aclanthology.org/2026.findings-acl.1352/
DOI:
Bibkey:
Cite (ACL):: Jeonghyun Park, Ingeol Baek, Seunghyun Yoon, Haeun Jang, Aparna Garimella, Akriti Jain, Nedim Lipka, and Hwanhee Lee. 2026. MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27089–27115, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference (Park et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1352.pdf
Checklist:: 2026.findings-acl.1352.checklist.pdf

PDF Cite Search Checklist Fix data