Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

Guisheng Liu; Yi Li; Zhengcong Fei; Haiyan Fu; Xiangyang Luo; Yanqing Guo

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo

Abstract

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.

Anthology ID:: 2024.lrec-main.1134
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 12954–12965
Language:
URL:: https://aclanthology.org/2024.lrec-main.1134/
DOI:
Bibkey:
Cite (ACL):: Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, and Yanqing Guo. 2024. Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12954–12965, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning (Liu et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1134.pdf

PDF Cite Search Fix data