Reference-Based Metrics Are Biased Against Blind and Low-Vision Users’ Image Description Preferences

Rhea Kapur, Elisa Kreiss


Abstract
Image description generation models are sophisticated Vision-Language Models which promise to make visual content, such as images, non-visually accessible through linguistic descriptions. While these systems can benefit all, their primary motivation tends to lie in allowing blind and low-vision (BLV) users access to increasingly visual (online) discourse. Well-defined evaluation methods are crucial for steering model development into socially useful directions. In this work, we show that the most popular evaluation metrics (reference-based metrics) are biased against BLV users and therefore potentially stifle useful model development. Reference-based metrics assign quality scores based on the similarity to human-generated ground-truth descriptions and are widely accepted as neutrally representing the needs of all users. However, we find that these metrics are more strongly correlated with sighted participant ratings than BLV ratings, and we explore factors which appear to mediate this finding: description length, the image’s context of appearance, and the number of reference descriptions available. These findings suggest that there is a need for developing evaluation methods that are established based on specific downstream user groups, and they highlight the importance of reflecting on emerging biases against minorities in the development of general-purpose automatic metrics.
Anthology ID:
2024.nlp4pi-1.26
Volume:
Proceedings of the Third Workshop on NLP for Positive Impact
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
308–314
Language:
URL:
https://aclanthology.org/2024.nlp4pi-1.26
DOI:
10.18653/v1/2024.nlp4pi-1.26
Bibkey:
Cite (ACL):
Rhea Kapur and Elisa Kreiss. 2024. Reference-Based Metrics Are Biased Against Blind and Low-Vision Users’ Image Description Preferences. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 308–314, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Reference-Based Metrics Are Biased Against Blind and Low-Vision Users’ Image Description Preferences (Kapur & Kreiss, NLP4PI 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4pi-1.26.pdf