Assessing Image-Captioning Models: A Novel Framework Integrating Statistical Analysis and Metric Patterns

Qiaomu Li, Ying Xie, Nina Grundlingh, Varsha Rani Chawan, Cody Wang


Abstract
In this study, we present a novel evaluation framework for image-captioning models that integrate statistical analysis with common evaluation metrics, utilizing two popular datasets, FashionGen and Amazon, with contrasting dataset variation to evaluate four models: Video-LLaVa, BLIP, CoCa and ViT-GPT2. Our approach not only reveals the comparative strengths of models, offering insights into their adaptability and applicability in real-world scenarios but also contributes to the field by providing a comprehensive evaluation method that considers both statistical significance and practical relevance to guide the selection of models for specific applications. Specifically, we propose Rank Score as a new evaluation metric that is designed for e-commerce image search applications and employ CLIP Score to quantify dataset variation to offer a holistic view of model performance.
Anthology ID:
2024.ecnlp-1.9
Volume:
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Shervin Malmasi, Besnik Fetahu, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
Venues:
ECNLP | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
79–87
Language:
URL:
https://aclanthology.org/2024.ecnlp-1.9
DOI:
Bibkey:
Cite (ACL):
Qiaomu Li, Ying Xie, Nina Grundlingh, Varsha Rani Chawan, and Cody Wang. 2024. Assessing Image-Captioning Models: A Novel Framework Integrating Statistical Analysis and Metric Patterns. In Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024, pages 79–87, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Assessing Image-Captioning Models: A Novel Framework Integrating Statistical Analysis and Metric Patterns (Li et al., ECNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ecnlp-1.9.pdf