The Gap in the Strategy of Recovering Task Failure between GPT-4V and Humans in a Visual Dialogue

Ryosuke Oshima; Seitaro Shinagawa; Shigeo Morishima

doi:10.18653/v1/2024.sigdial-1.62

The Gap in the Strategy of Recovering Task Failure between GPT-4V and Humans in a Visual Dialogue

Ryosuke Oshima, Seitaro Shinagawa, Shigeo Morishima

Abstract

Goal-oriented dialogue systems interact with humans to accomplish specific tasks. However, sometimes these systems fail to establish a common ground with users, leading to task failures. In such cases, it is crucial not to just end with failure but to correct and recover the dialogue to turn it into a success for building a robust goal-oriented dialogue system. Effective recovery from task failures in a goal-oriented dialogue involves not only successful recovery but also accurately understanding the situation of the failed task to minimize unnecessary interactions and avoid frustrating the user. In this study, we analyze the capabilities of GPT-4V in recovering failure tasks by comparing its performance with that of humans using Guess What?! Game. The results show that GPT-4V employs less efficient recovery strategies, such as asking additional unnecessary questions, than humans. We also found that while humans can occasionally ask questions that doubt the accuracy of the interlocutor’s answer during task recovery, GPT-4V lacks this capability.

Anthology ID:: 2024.sigdial-1.62
Volume:: Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: September
Year:: 2024
Address:: Kyoto, Japan
Editors:: Tatsuya Kawahara, Vera Demberg, Stefan Ultes, Koji Inoue, Shikib Mehri, David Howcroft, Kazunori Komatani
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 728–745
Language:
URL:: https://aclanthology.org/2024.sigdial-1.62/
DOI:: 10.18653/v1/2024.sigdial-1.62
Bibkey:
Cite (ACL):: Ryosuke Oshima, Seitaro Shinagawa, and Shigeo Morishima. 2024. The Gap in the Strategy of Recovering Task Failure between GPT-4V and Humans in a Visual Dialogue. In Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 728–745, Kyoto, Japan. Association for Computational Linguistics.
Cite (Informal):: The Gap in the Strategy of Recovering Task Failure between GPT-4V and Humans in a Visual Dialogue (Oshima et al., SIGDIAL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigdial-1.62.pdf

PDF Cite Search Fix data