Smruti Smita Lenka

2025

This paper introduces OVQA, the first multimodal dataset designed for visual question-answering (VQA), visual question elicitation (VQE), and multimodal research for the low-resource Odia language. The dataset was created by manually translating 6,149 English question-answer pairs, each associated with 6,149 unique images from the Visual Genome dataset. This effort resulted in 27,809 English-Odia parallel sentences, ensuring a semantic match with the corresponding visual information. Several baseline experiments were conducted on the dataset, including visual question answering and visual question elicitation. The dataset is the first VQA dataset for the low-resource Odia language and will be released for multimodal research purposes and also help researchers extend for other low-resource languages.

Co-authors

Ketan Kotwal 1

Shantipriya Parida 1

Shashikanta Sahoo 1

Kalyanamalini Sahoo 1

Sambit Sekhar 1

Venues

IndoNLP1
WS1

Fix author