Transparent and Coherent Procedural Mistake Detection

Shane Storks; Itamar Bar-Yossef; Yayuan Li; Zheyuan Zhang; Jason J Corso; Joyce Chai

doi:10.18653/v1/2025.emnlp-main.706

Transparent and Coherent Procedural Mistake Detection

Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J Corso, Joyce Chai

Abstract

Procedural mistake detection (PMD) is a challenging problem of classifying whether a human user (observed through egocentric video) has successfully executed a task (specified by a procedural text). Despite significant recent efforts, machine performance in the wild remains nonviable, and the reasoning processes underlying this performance are opaque. As such, we extend PMD to require generating visual self-dialog rationales to inform decisions. Given the impressive, mature image understanding capabilities observed in recent vision-and-language models (VLMs), we curate a suitable benchmark dataset for PMD based on individual frames. As our reformulation enables unprecedented transparency, we leverage a natural language inference (NLI) model to formulate two automated metrics for the coherence of generated rationales. We establish baselines for this reframed task, showing that VLMs struggle off-the-shelf, but with some trade-offs, their accuracy, coherence, and efficiency can be improved by incorporating these metrics into common inference and fine-tuning methods. Lastly, our multi-faceted metrics visualize common outcomes, highlighting areas for further improvement.

Anthology ID:: 2025.emnlp-main.706
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13968–14002
Language:
URL:: https://aclanthology.org/2025.emnlp-main.706/
DOI:: 10.18653/v1/2025.emnlp-main.706
Bibkey:
Cite (ACL):: Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J Corso, and Joyce Chai. 2025. Transparent and Coherent Procedural Mistake Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13968–14002, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Transparent and Coherent Procedural Mistake Detection (Storks et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.706.pdf
Checklist:: 2025.emnlp-main.706.checklist.pdf

PDF Cite Search Checklist Fix data