ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos

Arpan Phukan, Manish Gupta, Asif Ekbal


Abstract
Previous studies on question generation from videos have mostly focused on generating questions about common objects and attributes and hence are not entity-centric. In this work, we focus on the generation of entity-centric information-seeking questions from videos. Such a system could be useful for video-based learning, recommending “People Also Ask” questions, video-based chatbots, and fact-checking. Our work addresses three key challenges: identifying question-worthy information, linking it to entities, and effectively utilizing multimodal signals. Further, to the best of our knowledge, there does not exist a large-scale dataset for this task. Most video question generation datasets are on TV shows, movies, or human activities or lack entity-centric information-seeking questions. Hence, we contribute a diverse dataset of YouTube videos, VideoQuestions, consisting of 411 videos with 2265 manually annotated questions. We further propose a model architecture combining Transformers, rich context signals (titles, transcripts, captions, embeddings), and a combination of cross-entropy and contrastive loss function to encourage entity-centric question generation. Our best method yields BLEU, ROUGE, CIDEr, and METEOR scores of 71.3, 78.6, 7.31, and 81.9, respectively, demonstrating practical usability. We make the code and dataset publicly available.
Anthology ID:
2024.emnlp-main.798
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14411–14436
Language:
URL:
https://aclanthology.org/2024.emnlp-main.798
DOI:
Bibkey:
Cite (ACL):
Arpan Phukan, Manish Gupta, and Asif Ekbal. 2024. ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14411–14436, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos (Phukan et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.798.pdf