Emotion-Aware Dysarthric Speech Reconstruction: LLMs and Multimodal Evaluation with MCDS

Kaushal Attaluri; Radhika Mamidi; Sireesha Chittepu; Anirudh Chebolu; Hitendra Sarma Thogarcheti

Emotion-Aware Dysarthric Speech Reconstruction: LLMs and Multimodal Evaluation with MCDS

Kaushal Attaluri, Radhika Mamidi, Sireesha Chittepu, Anirudh Chebolu, Hitendra Sarma Thogarcheti

Abstract

Dysarthria, a motor speech disorder affecting over 46 million individuals globally, impairs both intelligibility and emotional expression in communication. This work introduces a novel framework for emotion-aware sentence reconstruction from dysarthric speech using Large Language Models (LLMs) fine-tuned with QLoRA, namely LLaMA 3.1 and Mistral 8x7B. Our pipeline integrates direct emotion recognition from raw audio and conditions textual reconstruction on this emotional context to enhance both semantic and affective fidelity.We propose the Multimodal Communication Dysarthria Score (MCDS), a holistic evaluation metric combining BLEU, semantic similarity, emotion consistency, and human ratings:MCDS=αB+βE+γS+δHwhere 𝛼 + 𝛽 + 𝛾 + 𝛿 = 1.On our extended TORGO+ dataset, our emotion-aware LLM model achieves a MCDS of 0.87 and BLEU of 72.4%, significantly outperforming traditional pipelines like Kaldi GMM-HMM (MCDS: 0.52, BLEU: 38.1%) and Whisper-based models. It also surpasses baseline LLM systems by 0.09 MCDS. This sets a new benchmark in emotionally intelligent dysarthric speech reconstruction, with future directions including multilingual support and real-time deployment.

Anthology ID:: 2025.findings-ijcnlp.63
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 1072–1080
Language:
URL:: https://aclanthology.org/2025.findings-ijcnlp.63/
DOI:
Bibkey:
Cite (ACL):: Kaushal Attaluri, Radhika Mamidi, Sireesha Chittepu, Anirudh Chebolu, and Hitendra Sarma Thogarcheti. 2025. Emotion-Aware Dysarthric Speech Reconstruction: LLMs and Multimodal Evaluation with MCDS. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1072–1080, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: Emotion-Aware Dysarthric Speech Reconstruction: LLMs and Multimodal Evaluation with MCDS (Attaluri et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-ijcnlp.63.pdf

PDF Cite Search Fix data