Peshala Sandali Perera
2025
A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance
Peshala Sandali Perera
|
Deshan Koshala Sumanathilaka
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Dyslexia in adults remains an under-researched and under-served area, particularly in non-English-speaking contexts, despite its significant impact on personal and professional lives. This work addresses that gap by focusing on Sinhala, a low-resource language with limited tools for linguistic accessibility. We present an assistive system designed specifically for Sinhala-speaking adults with dyslexia. The system integrates Whisper for speech-to-text conversion, SinBERT a open sourced fine-tuned BERT model trained for Sinhala to identify common dyslexic errors, and a combined mT5 and Mistral-based model to generate corrected text. Finally, the output is converted back to speech using gTTS, creating a complete multi modal feedback loop. Despite the challenges posed by limited Sinhala-language datasets, the system achieves 66% transcription accuracy and 70% correction accuracy with 65% overall system accuracy. These results demonstrate both the feasibility and effectiveness of the approach. Ultimately, this work highlights the importance of inclusive NLP technologies in underrepresented languages and showcases a practical step toward improving accessibility for adult dyslexic users.