Semi-Supervised Learning for Video Captioning

Ke Lin; Zhuoxin Gan; Liwei Wang

doi:10.18653/v1/2020.findings-emnlp.98

Semi-Supervised Learning for Video Captioning

Abstract

Deep neural networks have made great success on video captioning in supervised learning setting. However, annotating videos with descriptions is very expensive and time-consuming. If the video captioning algorithm can benefit from a large number of unlabeled videos, the cost of annotation can be reduced. In the proposed study, we make the first attempt to train the video captioning model on labeled data and unlabeled data jointly, in a semi-supervised learning manner. For labeled data, we train them with the traditional cross-entropy loss. For unlabeled data, we leverage a self-critical policy gradient method with the difference between the scores obtained by Monte-Carlo sampling and greedy decoding as the reward function, while the scores are the negative K-L divergence between output distributions of original video data and augmented video data. The final loss is the weighted sum of losses obtained by labeled data and unlabeled data. Experiments conducted on VATEX, MSR-VTT and MSVD dataset demonstrate that the introduction of unlabeled data can improve the performance of the video captioning model. The proposed semi-supervised learning algorithm also outperforms several state-of-the-art semi-supervised learning approaches.

Anthology ID:: 2020.findings-emnlp.98
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1096–1106
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.98/
DOI:: 10.18653/v1/2020.findings-emnlp.98
Bibkey:
Cite (ACL):: Ke Lin, Zhuoxin Gan, and Liwei Wang. 2020. Semi-Supervised Learning for Video Captioning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1096–1106, Online. Association for Computational Linguistics.
Cite (Informal):: Semi-Supervised Learning for Video Captioning (Lin et al., Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.98.pdf
Data: MSVD, VATEX

PDF Cite Search Fix data