A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance

Peshala Sandali Perera, Deshan Koshala Sumanathilaka


Abstract
Dyslexia in adults remains an under-researched and under-served area, particularly in non-English-speaking contexts, despite its significant impact on personal and professional lives. This work addresses that gap by focusing on Sinhala, a low-resource language with limited tools for linguistic accessibility. We present an assistive system designed specifically for Sinhala-speaking adults with dyslexia. The system integrates Whisper for speech-to-text conversion, SinBERT a open sourced fine-tuned BERT model trained for Sinhala to identify common dyslexic errors, and a combined mT5 and Mistral-based model to generate corrected text. Finally, the output is converted back to speech using gTTS, creating a complete multi modal feedback loop. Despite the challenges posed by limited Sinhala-language datasets, the system achieves 66% transcription accuracy and 70% correction accuracy with 65% overall system accuracy. These results demonstrate both the feasibility and effectiveness of the approach. Ultimately, this work highlights the importance of inclusive NLP technologies in underrepresented languages and showcases a practical step toward improving accessibility for adult dyslexic users.
Anthology ID:
2025.ranlp-1.106
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
925–933
Language:
URL:
https://aclanthology.org/2025.ranlp-1.106/
DOI:
Bibkey:
Cite (ACL):
Peshala Sandali Perera and Deshan Koshala Sumanathilaka. 2025. A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 925–933, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance (Perera & Sumanathilaka, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.106.pdf