Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning

Tejas Srinivasan; Jack Hessel; Tanmay Gupta; Bill Yuchen Lin; Yejin Choi; Jesse Thomason; Khyathi Chandu

doi:10.18653/v1/2024.findings-acl.767

Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning

Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Chandu

Abstract

Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain. However, when deploying a vision-language system with low tolerance for inaccurate predictions, selective prediction may be over-cautious and abstain too frequently, even on many correct predictions. We introduce ReCoVERR, an inference-time algorithm to reduce the over-abstention of a selective vision-language system without increasing the error rate of the system’s predictions. When the VLM makes a low-confidence prediction, instead of abstaining ReCoVERR tries to find relevant clues in the image that provide additional evidence for the prediction. ReCoVERR uses an LLM to pose related questions to the VLM, collects high-confidence evidences, and if enough evidence confirms the prediction the system makes a prediction instead of abstaining. ReCoVERR enables three VLMs (BLIP2, InstructBLIP and LLaVA-1.5) to answer up to 20% more questions on the VQAv2 and A-OKVQA tasks without decreasing system accuracy, thus improving overall system reliability. Our code is available at https://github.com/tejas1995/ReCoVERR.

Anthology ID:: 2024.findings-acl.767
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12935–12948
Language:
URL:: https://aclanthology.org/2024.findings-acl.767
DOI:: 10.18653/v1/2024.findings-acl.767
Bibkey:
Cite (ACL):: Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, and Khyathi Chandu. 2024. Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning. In Findings of the Association for Computational Linguistics: ACL 2024, pages 12935–12948, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning (Srinivasan et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.767.pdf

PDF Cite Search