Retrieval-augmented Video Encoding for Instructional Captioning

Yeonjoon Jung; Minsoo Kim; Seungtaek Choi; Jihyuk Kim; Minji Seo; Seung-won Hwang

doi:10.18653/v1/2023.findings-acl.543

Retrieval-augmented Video Encoding for Instructional Captioning

Yeonjoon Jung, Minsoo Kim, Seungtaek Choi, Jihyuk Kim, Minji Seo, Seung-won Hwang

Abstract

Instructional videos make learning knowledge more efficient, by providing a detailed multimodal context of each procedure in instruction.A unique challenge posed by instructional videos is key-object degeneracy, where any single modality fails to sufficiently capture the key objects referred to in the procedure. For machine systems, such degeneracy can disturb the performance of a downstream task such as dense video captioning, leading to the generation of incorrect captions omitting key objects. To repair degeneracy, we propose a retrieval-based framework to augment the model representations in the presence of such key-object degeneracy. We validate the effectiveness and generalizability of our proposed framework over baselines using modalities with key-object degeneracy.

Anthology ID:: 2023.findings-acl.543
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8554–8568
Language:
URL:: https://aclanthology.org/2023.findings-acl.543
DOI:: 10.18653/v1/2023.findings-acl.543
Bibkey:
Cite (ACL):: Yeonjoon Jung, Minsoo Kim, Seungtaek Choi, Jihyuk Kim, Minji Seo, and Seung-won Hwang. 2023. Retrieval-augmented Video Encoding for Instructional Captioning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8554–8568, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Retrieval-augmented Video Encoding for Instructional Captioning (Jung et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.543.pdf

PDF Cite Search