Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel; Yash Pratap Singh Rathore; Anurag Pradhan; Sushovan Jena; Arnav Bhavsar; Amit Shukla; Pawan Goyal

Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, Pawan Goyal

Abstract

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

Anthology ID:: 2025.mmloso-1.8
Volume:: Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Ankita Shukla, Sandeep Kumar, Amrit Singh Bedi, Tanmoy Chakraborty
Venues:: MMLoSo | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 86–94
Language:
URL:: https://aclanthology.org/2025.mmloso-1.8/
DOI:
Bibkey:
Cite (ACL):: Shruti Singh Baghel, Yash Pratap Singh Rathore, Anurag Pradhan, Sushovan Jena, Arnav Bhavsar, Amit Shukla, and Pawan Goyal. 2025. Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals. In Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025), pages 86–94, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals (Baghel et al., MMLoSo 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mmloso-1.8.pdf

PDF Cite Search Fix data