An Efficient Keyframes Selection Based Framework for Video Captioning

Alok Singh, Loitongbam Sanayai Meetei, Salam Michael Singh, Thoudam Doren Singh, Sivaji Bandyopadhyay


Abstract
Describing a video is a challenging yet attractive task since it falls into the intersection of computer vision and natural language generation. The attention-based models have reported the best performance. However, all these models follow similar procedures, such as segmenting videos into chunks of frames or sampling frames at equal intervals for visual encoding. The process of segmenting video into chunks or sampling frames at equal intervals causes encoding of redundant visual information and requires additional computational cost since a video consists of a sequence of similar frames and suffers from inescapable noise such as uneven illumination, occlusion and motion effects. In this paper, a boundary-based keyframes selection approach for video description is proposed that allow the system to select a compact subset of keyframes to encode the visual information and generate a description for a video without much degradation. The proposed approach uses 3 4 frames per video and yields competitive performance over two benchmark datasets MSVD and MSR-VTT (in both English and Hindi).
Anthology ID:
2021.icon-main.29
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
240–250
Language:
URL:
https://aclanthology.org/2021.icon-main.29
DOI:
Bibkey:
Cite (ACL):
Alok Singh, Loitongbam Sanayai Meetei, Salam Michael Singh, Thoudam Doren Singh, and Sivaji Bandyopadhyay. 2021. An Efficient Keyframes Selection Based Framework for Video Captioning. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 240–250, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
An Efficient Keyframes Selection Based Framework for Video Captioning (Singh et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.29.pdf
Optional supplementary material:
 2021.icon-main.29.OptionalSupplementaryMaterial.pdf
Data
Hindi MSR-VTTMSVD