Shashikanta Sahoo


2025

pdf bib
OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language
Shantipriya Parida | Shashikanta Sahoo | Sambit Sekhar | Kalyanamalini Sahoo | Ketan Kotwal | Sonal Khosla | Satya Ranjan Dash | Aneesh Bose | Guneet Singh Kohli | Smruti Smita Lenka | Ondřej Bojar
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages

This paper introduces OVQA, the first multimodal dataset designed for visual question-answering (VQA), visual question elicitation (VQE), and multimodal research for the low-resource Odia language. The dataset was created by manually translating 6,149 English question-answer pairs, each associated with 6,149 unique images from the Visual Genome dataset. This effort resulted in 27,809 English-Odia parallel sentences, ensuring a semantic match with the corresponding visual information. Several baseline experiments were conducted on the dataset, including visual question answering and visual question elicitation. The dataset is the first VQA dataset for the low-resource Odia language and will be released for multimodal research purposes and also help researchers extend for other low-resource languages.

2024

pdf bib
OdiaGenAI’s Participation in WMT2024 English-to-Low Resource Multimodal Translation Task
Shantipriya Parida | Shashikanta Sahoo | Sambit Sekhar | Upendra Jena | Sushovan Jena | Kusum Lata
Proceedings of the Ninth Conference on Machine Translation

This paper covers the system description of the team “ODIAGEN’s” submission to the WMT~2024 English-to-Low-Resource Multimodal Translation Task. We participated in the English-to-Low Resource Multimodal Translation Task, in two of the tasks, i.e. Text-only Translation and Multi-modal Translation. For Text-only Translation, we trained the Mistral-7B model for English to Multi-lingual (Hindi, Bengali, Malayalam, Hausa). For Multi-modal Translation (using both image and text), we trained the PaliGemma-3B model for English to Hindi translation.