Wenhao Fang


pdf bib
Knowledge-Guided Cross-Topic Visual Question Generation
Hongfei Liu | Guohua Wang | Jiayuan Xie | Jiali Chen | Wenhao Fang | Yi Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Visual question generation (VQG) task aims to generate high-quality questions based on the input image. Current methods primarily focus on generating questions containing specified content utilizing answers or question types as constraints. However, these constraints make it challenging to control the topic of generated questions (e.g., conversation or test subject topics) for various applications. Thus, it is necessary to utilize topics as constraints to guide question generation. Considering that there are many topics and it is almost impossible for human annotations to cover them, we propose the cross-topic learning VQG (CTL-VQG) task, which aims to generate questions related to unseen topics in cross-topic scenarios. In this paper, we propose a knowledge-guided cross-topic visual question generation (KC-VQG) model to extract unseen topic-related information for question generation. Specifically, an image-topic feature extractor is introduced in our model to extract topic-related intuitive visual features; an image-topic knowledge extractor is used to extract and select the most appropriate topic-related implicit knowledge from large language models for generating questions. Extensive experiments show that our model outperforms baselines and can effectively generate unseen topic-related questions in cross-topic scenarios.