Haozhan Shen


pdf bib
An Explainable Toolbox for Evaluating Pre-trained Vision-Language Models
Tiancheng Zhao | Tianqi Zhang | Mingwei Zhu | Haozhan Shen | Kyusong Lee | Xiaopeng Lu | Jianwei Yin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We introduce VL-CheckList, a toolbox for evaluating Vision-Language Pretraining (VLP) models, including the preliminary datasets that deepen the image-texting ability of a VLP model. Most existing VLP works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream task accuracy provides little information about the pros and cons of each VLP method. In this paper, we demonstrate how minor input changes in language and vision will affect the prediction outputs. Then, we describe the detailed user guidelines to utilize and contribute to the community. We show new findings on one of the representative VLP models to provide an example analysis. The data/code is available at https://github.com/om-ai-lab/VL-CheckList