The Role of Data Curation in Image Captioning

Wenyan Li, Jonas Lotz, Chen Qiu, Desmond Elliott


Abstract
Image captioning models are typically trained by treating all samples equally, neglecting to account for mismatched or otherwise difficult data points. In contrast, recent work has shown the effectiveness of training models by scheduling the data using curriculum learning strategies. This paper contributes to this direction by actively curating difficult samples in datasets without increasing the total number of samples. We explore the effect of using three data curation methods within the training process: complete removal of an sample, caption replacement, or image replacement via a text-to-image generation model. Experiments on the Flickr30K and COCO datasets with the BLIP and BEiT-3 models demonstrate that these curation methods do indeed yield improved image captioning models, underscoring their efficacy.
Anthology ID:
2024.eacl-long.65
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1074–1088
Language:
URL:
https://aclanthology.org/2024.eacl-long.65
DOI:
Bibkey:
Cite (ACL):
Wenyan Li, Jonas Lotz, Chen Qiu, and Desmond Elliott. 2024. The Role of Data Curation in Image Captioning. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1074–1088, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
The Role of Data Curation in Image Captioning (Li et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.65.pdf
Software:
 2024.eacl-long.65.software.zip