ReCAP: Semantic Role Enhanced Caption Generation

Abhidip Bhattacharyya; Martha Palmer; Christoffer Heckman

ReCAP: Semantic Role Enhanced Caption Generation

Abhidip Bhattacharyya, Martha Palmer, Christoffer Heckman

Abstract

Even though current vision language (V+L) models have achieved success in generating image captions, they often lack specificity and overlook various aspects of the image. Additionally, the attention learned through weak supervision operates opaquely and is difficult to control. To address these limitations, we propose the use of semantic roles as control signals in caption generation. Our hypothesis is that, by incorporating semantic roles as signals, the generated captions can be guided to follow specific predicate argument structures. To validate the effectiveness of our approach, we conducted experiments using data and compared the results with a baseline model VL-BART(CITATION). The experiments showed a significant improvement, with a gain of 45% in Smatch score (Standard NLP evaluation metric for semantic representations), demonstrating the efficacy of our approach. By focusing on specific objects and their associated semantic roles instead of providing a general description, our framework produces captions that exhibit enhanced quality, diversity, and controllability.

Anthology ID:: 2024.lrec-main.1191
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 13633–13649
Language:
URL:: https://aclanthology.org/2024.lrec-main.1191/
DOI:
Bibkey:
Cite (ACL):: Abhidip Bhattacharyya, Martha Palmer, and Christoffer Heckman. 2024. ReCAP: Semantic Role Enhanced Caption Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13633–13649, Torino, Italia. ELRA and ICCL.
Cite (Informal):: ReCAP: Semantic Role Enhanced Caption Generation (Bhattacharyya et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1191.pdf

PDF Cite Search Fix data