A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video Keito Kudo author Haruki Nagasawa author Jun Suzuki author Nobuyuki Shimizu author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication kudo-etal-2023-challenging 10.18653/v1/2023.emnlp-main.457 https://aclanthology.org/2023.emnlp-main.457/ 2023-12 7380 7402