Teaching People LLM’s Errors and Getting it Right

Nathan Stringham; Fateme Hashemi Chaleshtori; Xinyuan Yan; Zhichao Xu; Bei Wang; Ana Marasović

Teaching People LLM’s Errors and Getting it Right

Nathan Stringham, Fateme Hashemi Chaleshtori, Xinyuan Yan, Zhichao Xu, Bei Wang, Ana Marasovic

Abstract

People often rely on large language models (LLMs) in situations where they are ill-suited. This miscalibration is understandable: seeing LLMs compose poetry and answer complex questions can lead users to assume, incorrectly, that they will also handle simple tasks, such as basic arithmetic, without error. Prior work has attempted to address this issue by clustering instance embeddings to identify regions where an LLM is likely to fail, then automatically describing the patterns within those regions. These inferred “failure patterns” are taught to users to reduce overreliance. Yet, this approach has not been fully successful. In this paper, we investigate why.We first examine whether the negative results stem from an absence of meaningful failure patterns. Using two datasets, we group instances by their meta-labels and evaluate LLM performance within each group. We then define criteria to identify groups that are both sufficiently large and exhibit high error rates. This process reveals multiple meta-label groups that meet these criteria, indicating that actionable failure patterns do, in fact, exist. Next, we test whether prompting- and embedding-based methods can reliably surface these known failure patterns. This step is critical: if such patterns cannot be surfaced automatically, they cannot be communicated to users. We observe mixed performance across methods, which may explain the limited success of prior approaches. Finally, we revisit how teaching effectiveness is measured. We propose evaluating whether users can apply learned failure patterns to anticipate when an LLM is likely to err. A user study shows that instruction based on this metric yields measurable improvements, unlike standard human–AI team accuracy metrics. Overall, our findings suggest that teaching failure patterns can be an effective way to mitigate overreliance, but its success depends on improved automated methods for discovering these patterns and on evaluation metrics like ours.

Anthology ID:: 2026.trustnlp-main.11
Volume:: Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 204–226
Language:
URL:: https://aclanthology.org/2026.trustnlp-main.11/
DOI:
Bibkey:
Cite (ACL):: Nathan Stringham, Fateme Hashemi Chaleshtori, Xinyuan Yan, Zhichao Xu, Bei Wang, and Ana Marasovic. 2026. Teaching People LLM’s Errors and Getting it Right. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 204–226, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Teaching People LLM’s Errors and Getting it Right (Stringham et al., TrustNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.trustnlp-main.11.pdf

PDF Cite Search Fix data