A Dataset for Programming-based Instructional Video Classification and Question Answering

Sana Javaid Raja, Adeel Zafar, Aqsa Shoaib


Abstract
This work aims to develop an understanding of the rapidly emerging field of VideoQA, particularly in the context of instructional programming videos. It also encourages designing of system that can produce visual answer to programming based natural language questions. We introduce two datasets: CodeVidQA, with 2,104 question-answer pair links with timestamps taken from programming videos of Stack Overflow for Programming Visual Answer Localization task, and CodeVidCL with 4,331 videos (1,751 programming ,2580 non-programming) for Programming Video Classification task. In addition, we proposed a framework that adapts BigBird and SVM for video classification techniques. The proposed approach achieves a significantly high accuracy of 99.61% for video classification.
Anthology ID:
2025.evalmg-1.1
Volume:
Proceedings of the First Workshop of Evaluation of Multi-Modal Generation
Month:
Jan
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Wei Emma Zhang, Xiang Dai, Desmond Elliot, Byron Fang, Mongyuan Sim, Haojie Zhuang, Weitong Chen
Venues:
EvalMG | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2025.evalmg-1.1/
DOI:
Bibkey:
Cite (ACL):
Sana Javaid Raja, Adeel Zafar, and Aqsa Shoaib. 2025. A Dataset for Programming-based Instructional Video Classification and Question Answering. In Proceedings of the First Workshop of Evaluation of Multi-Modal Generation, pages 1–9, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Dataset for Programming-based Instructional Video Classification and Question Answering (Raja et al., EvalMG 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.evalmg-1.1.pdf