Multi-Task Video Captioning with Video and Entailment Generation

Ramakanth Pasunuru, Mohit Bansal


Abstract
Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generation tasks: a temporally-directed unsupervised video prediction task to learn richer context-aware video encoder representations, and a logically-directed language entailment generation task to learn better video-entailing caption decoder representations. For this, we present a many-to-many multi-task learning model that shares parameters across the encoders and decoders of the three tasks. We achieve significant improvements and the new state-of-the-art on several standard video captioning datasets using diverse automatic and human evaluations. We also show mutual multi-task improvements on the entailment generation task.
Anthology ID:
P17-1117
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1273–1283
Language:
URL:
https://aclanthology.org/P17-1117
DOI:
10.18653/v1/P17-1117
Bibkey:
Cite (ACL):
Ramakanth Pasunuru and Mohit Bansal. 2017. Multi-Task Video Captioning with Video and Entailment Generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1273–1283, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Multi-Task Video Captioning with Video and Entailment Generation (Pasunuru & Bansal, ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1117.pdf
Note:
 P17-1117.Notes.pdf
Video:
 https://aclanthology.org/P17-1117.mp4
Data
ImageNetMSVDSNLIUCF101