An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models

Tiancheng Zhao, Tianqi Zhang, Mingwei Zhu, Haozhan Shen, Kyusong Lee, Xiaopeng Lu, Jianwei Yin


Abstract
We introduce VL-CheckList, a toolbox for evaluating Vision-Language Pretraining (VLP) models, including the preliminary datasets that deepen the image-texting ability of a VLP model. Most existing VLP works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream task accuracy provides little information about the pros and cons of each VLP method. In this paper, we demonstrate how minor input changes in language and vision will affect the prediction outputs. Then, we describe the detailed user guidelines to utilize and contribute to the community. We show new findings on one of the representative VLP models to provide an example analysis. The data/code is available at https://github.com/om-ai-lab/VL-CheckList
Anthology ID:
2022.emnlp-demos.4
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Wanxiang Che, Ekaterina Shutova
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–37
Language:
URL:
https://aclanthology.org/2022.emnlp-demos.4
DOI:
10.18653/v1/2022.emnlp-demos.4
Bibkey:
Cite (ACL):
Tiancheng Zhao, Tianqi Zhang, Mingwei Zhu, Haozhan Shen, Kyusong Lee, Xiaopeng Lu, and Jianwei Yin. 2022. An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 30–37, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models (Zhao et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-demos.4.pdf