From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li


Abstract
Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a Recursive Visual Explanation algorithm. Our method iteratively computes visual features (conditioned on the text input), an answer, and an explanation, to improve the explanation quality step by step until the answer converges. We find that this multi-step approach guides the model to correct its own answers and outperforms single-step explanation generation. Furthermore, explanations generated by ReVisE also serve as valuable annotations for few-shot self-training. Our approach outperforms previous methods while utilizing merely 5% of the human-annotated explanations across 10 metrics, demonstrating up to a 4.2 and 1.3 increase in BLEU-1 score on the VCR and VQA-X datasets, underscoring the efficacy and data-efficiency of our method.
Anthology ID:
2023.emnlp-main.75
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1173–1185
Language:
URL:
https://aclanthology.org/2023.emnlp-main.75
DOI:
10.18653/v1/2023.emnlp-main.75
Bibkey:
Cite (ACL):
Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, and Boyi Li. 2023. From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1173–1185, Singapore. Association for Computational Linguistics.
Cite (Informal):
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation (Ge et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.75.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.75.mp4