Can Small Vision–Language Models Perform Sign Language Translation?

Anal Roy Chowdhury; Debarshi Kumar Sanyal

doi:10.18653/v1/2026.findings-acl.1609

Can Small Vision–Language Models Perform Sign Language Translation?

Anal Roy Chowdhury, Debarshi Kumar Sanyal

Abstract

Vision-Language Models (VLMs) have shown strong generalization across multimodal tasks, but their capacity to handle sign language translation (SLT), which requires fine-grained spatiotemporal reasoning and linguistic understanding, remains unclear. In this study, we evaluate whether small VLMs (with ≤3B parameters) can perform SLT effectively. We perform supervised fine-tuning on four publicly available multilingual SLT datasets, including one German (DGS), two American (ASL), and one Indian (ISL), applying parameter-efficient LoRA to the language decoder while keeping the vision encoder frozen and training only the connector. To evaluate translation quality, we propose entity- and semantics-aware metrics tailored for SLT. We highlight the data imbalance issues present in the above widely used SLT datasets. Our analysis highlights the limitations in applying general-purpose VLMs to SLT, unlike their applicability in other tasks, and provides insights to inform future development of VLMs for SLP, which is essential for building inclusive AI applications.

Anthology ID:: 2026.findings-acl.1609
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32150–32166
Language:
URL:: https://aclanthology.org/2026.findings-acl.1609/
DOI:: 10.18653/v1/2026.findings-acl.1609
Bibkey:
Cite (ACL):: Anal Roy Chowdhury and Debarshi Kumar Sanyal. 2026. Can Small Vision–Language Models Perform Sign Language Translation?. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32150–32166, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Can Small Vision–Language Models Perform Sign Language Translation? (Chowdhury & Sanyal, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1609.pdf
Checklist:: 2026.findings-acl.1609.checklist.pdf

PDF Cite Search Checklist Fix data