BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences

Hiroto Otake; Peinan Zhang; Yusuke Sakai; Masato Mita; Hiroki Ouchi; Taro Watanabe

doi:10.18653/v1/2025.findings-emnlp.1311

BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences

Hiroto Otake, Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, Taro Watanabe

Abstract

Web banner advertisements, which are placed on websites to guide users to a targeted landing page (LP), are still often selected manually because human preferences are important in selecting which ads to deliver. To automate this process, we propose a new benchmark, BannerBench, to evaluate the human preference-driven banner selection process using vision-language models (VLMs). This benchmark assesses the degree of alignment with human preferences in two tasks: a ranking task and a best-choice task, both using sets of five images derived from a single LP. Our experiments show that VLMs are moderately correlated with human preferences on the ranking task. In the best-choice task, most VLMs perform close to chance level across various prompting strategies. These findings suggest that although VLMs have a basic understanding of human preferences, most of them struggle to pinpoint a single suitable option from many candidates.

Anthology ID:: 2025.findings-emnlp.1311
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24145–24159
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1311/
DOI:: 10.18653/v1/2025.findings-emnlp.1311
Bibkey:
Cite (ACL):: Hiroto Otake, Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, and Taro Watanabe. 2025. BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24145–24159, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences (Otake et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1311.pdf
Checklist:: 2025.findings-emnlp.1311.checklist.pdf

PDF Cite Search Checklist Fix data