Video Paragraph Captioning as a Text Summarization Task

Hui Liu, Xiaojun Wan


Abstract
Video paragraph captioning aims to generate a set of coherent sentences to describe a video that contains several events. Most previous methods simplify this task by using ground-truth event segments. In this work, we propose a novel framework by taking this task as a text summarization task. We first generate lots of sentence-level captions focusing on different video clips and then summarize these captions to obtain the final paragraph caption. Our method does not depend on ground-truth event segments. Experiments on two popular datasets ActivityNet Captions and YouCookII demonstrate the advantages of our new framework. On the ActivityNet dataset, our method even outperforms some previous methods using ground-truth event segment labels.
Anthology ID:
2021.acl-short.9
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–60
Language:
URL:
https://aclanthology.org/2021.acl-short.9
DOI:
10.18653/v1/2021.acl-short.9
Bibkey:
Cite (ACL):
Hui Liu and Xiaojun Wan. 2021. Video Paragraph Captioning as a Text Summarization Task. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 55–60, Online. Association for Computational Linguistics.
Cite (Informal):
Video Paragraph Captioning as a Text Summarization Task (Liu & Wan, ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-short.9.pdf
Video:
 https://aclanthology.org/2021.acl-short.9.mp4
Data
ActivityNet Captions