Yingshan Chang
2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
|
So Yeon Min
|
Yingshan Chang
|
Yonatan Bisk
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model’s ability to detect “silent” tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.
VISREAS: Complex Visual Reasoning with Unanswerable Questions
Syeda Nahida Akter
|
Sangwu Lee
|
Yingshan Chang
|
Yonatan Bisk
|
Eric Nyberg
Findings of the Association for Computational Linguistics: ACL 2024
Verifying a question’s validity before answering is crucial in real-world applications, where users may provide imperfect instructions. In this scenario, an ideal model should address the discrepancies in the query and convey them to the users rather than generating the best possible answer. Addressing this requirement, we introduce a new compositional visual question-answering dataset, VisReas, that consists of answerable and unanswerable visual queries formulated by traversing and perturbing commonalities and differences among objects, attributes, and relations. VisReas contains 2.07M semantically diverse queries generated automatically using Visual Genome scene graphs. The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, Logic2Vision that reasons by producing and executing pseudocode without any external modules to generate the answer. Logic2Vision outperforms generative models in VisReas (+4.82% over LLaVA-1.5; +12.23% over InstructBLIP) and achieves a significant gain in performance against the classification models.
Search
Co-authors
- Yonatan Bisk 2
- Jimin Sun 1
- So Yeon Min 1
- Syeda Nahida Akter 1
- Sangwu Lee 1
- show all...