Gary Farmaner


2024

pdf bib
Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models
James Fletcher | Nicholas Dehnen | Seyed Nima Tayarani Bathaie | Aijun An | Heidar Davoudi | Ron DiCarlantonio | Gary Farmaner
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

To enhance a question-answering system for automotive drivers, we tackle the problem of automatic generation of icon image descriptions. The descriptions can match the driver’s query about the icon appearing on the dashboard and tell the driver what is happening so that they may take an appropriate action. We use three state-of-the-art large vision-language models to generate both visual and functional descriptions based on the icon image and its context information in the car manual. Both zero-shot and few-shot prompts are used. We create a dataset containing over 400 icons with their ground-truth descriptions and use it to evaluate model-generated descriptions across several performance metrics. Our evaluation shows that two of these models (GPT-4o and Claude 3.5) performed well on this task, while the third model (LLaVA-NEXT) performs poorly.