Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation

Victor Tolulope Olufemi; Oreoluwa Boluwatife Babatunde; Emmanuel Bolarinwa; Kausar Yetunde Moshood

doi:10.18653/v1/2025.africanlp-1.22

Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation

Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, Kausar Yetunde Moshood

Abstract

Despite rapid advancements in multimodal large language models (MLLMs), their ability to process low-resource African languages in document-based visual question answering (VQA) tasks remains limited. This paper evaluates three state-of-the-art MLLMs—GPT-4o, Claude-3.5 Haiku, and Gemini-1.5 Pro—on WAEC/NECO standardized exam questions in Yoruba, Igbo, and Hausa. We curate a dataset of multiple-choice questions from exam images and compare model accuracies across two prompting strategies: (1) using English prompts for African language questions, and (2) using native-language prompts. While GPT-4o achieves over 90% accuracy for English, performance drops below 40% for African languages, highlighting severe data imbalance in model training. Notably, native-language prompting improves accuracy for most models, yet no system approaches human-level performance, which reaches over 50% in Yoruba, Igbo, and Hausa. These findings emphasize the need for diverse training data, fine-tuning, and dedicated benchmarks that address the linguistic intricacies of African languages in multimodal tasks, paving the way for more equitable and effective AI systems in education.

Anthology ID:: 2025.africanlp-1.22
Volume:: Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Constantine Lignos, Idris Abdulmumin, David Adelani
Venues:: AfricaNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 150–157
Language:
URL:: https://aclanthology.org/2025.africanlp-1.22/
DOI:: 10.18653/v1/2025.africanlp-1.22
Bibkey:
Cite (ACL):: Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, and Kausar Yetunde Moshood. 2025. Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 150–157, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation (Olufemi et al., AfricaNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.africanlp-1.22.pdf

PDF Cite Search Fix data