%0 Conference Proceedings
%T Multimodal Abstractive Summarization for How2 Videos
%A Palaskar, Shruti
%A Libovický, Jindřich
%A Gella, Spandana
%A Metze, Florian
%Y Korhonen, Anna
%Y Traum, David
%Y Màrquez, Lluís
%S Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
%D 2019
%8 July
%I Association for Computational Linguistics
%C Florence, Italy
%F palaskar-etal-2019-multimodal
%X In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to “compress” text information but rather to provide a fluent textual summary of information that has been collected and fused from different source modalities, in our case video and audio transcripts (or text). We show how a multi-source sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, compare various models trained with different modalities and present pilot experiments on the How2 corpus of instructional videos. We also propose a new evaluation metric (Content F1) for abstractive summarization task that measures semantic adequacy rather than fluency of the summaries, which is covered by metrics like ROUGE and BLEU.
%R 10.18653/v1/P19-1659
%U https://aclanthology.org/P19-1659
%U https://doi.org/10.18653/v1/P19-1659
%P 6587-6596