EVil-Probe - a Composite Benchmark for Extensive Visio-Linguistic Probing

Marie Bexte, Andrea Horbach, Torsten Zesch


Abstract
Research probing the language comprehension of visio-linguistic models has gained traction due to their remarkable performance on various tasks. We introduce EViL-Probe, a composite benchmark that processes existing probing datasets into a unified format and reorganizes them based on the linguistic categories they probe. On top of the commonly used negative probes, this benchmark introduces positive probes to more rigorously test the robustness of models. Since the language side alone may introduce a bias models could exploit in solving the probes, we estimate the difficulty of the individual subsets with a language-only baseline. Using the benchmark to probe a set of state-of-the-art visio-linguistic models sheds light on how sensitive they are to the different linguistic categories. Results show that the benchmark is challenging for all models we probe, as their performance is around the chance baseline for many of the categories. The only category all models are able to handle relatively well are nouns. Additionally, models that use a Vision Transformer to process the images are also somewhat robust against probes targeting color and image type. Among these models, our enrichment of EViL-Probe with positive probes helps further discriminate performance, showing BLIP to be the overall best-performing model.
Anthology ID:
2024.lrec-main.591
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6682–6700
Language:
URL:
https://aclanthology.org/2024.lrec-main.591
DOI:
Bibkey:
Cite (ACL):
Marie Bexte, Andrea Horbach, and Torsten Zesch. 2024. EVil-Probe - a Composite Benchmark for Extensive Visio-Linguistic Probing. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6682–6700, Torino, Italia. ELRA and ICCL.
Cite (Informal):
EVil-Probe - a Composite Benchmark for Extensive Visio-Linguistic Probing (Bexte et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.591.pdf