Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

Minseok Kim; Jingxiang Chen; Seong-Gyun Leem; Yin Huang; Rashi Rungta; Zhicheng Ouyang; Haibin Wu; Surya Teja Appini; Ankur Bansal; Yang Bai; Yue Liu; Florian Metze; Ahmed A Aly; Anuj Kumar; Ariya Rastrow; Zhaojiang Lin

Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

Minseok Kim, Jingxiang Chen, Seong-Gyun Leem, Yin Huang, Rashi Rungta, Zhicheng Ouyang, Haibin Wu, Surya Teja Appini, Ankur Bansal, Yang Bai, Yue Liu, Florian Metze, Ahmed A Aly, Anuj Kumar, Ariya Rastrow, Zhaojiang Lin

Abstract

Speech large language models (LLMs) observe paralinguistic cues such as prosody, emotion, and non-verbal sounds—crucial for intent understanding. However, leveraging these cues faces challenges: limited training data, annotation difficulty, and models exploiting lexical shortcuts over paralinguistic signals. We propose multi-task reinforcement learning (RL) with chain-of-thought prompting that elicits explicit affective reasoning. To address data scarcity, we introduce a paralinguistics-aware speech LLM (PALLM) that jointly optimizes sentiment classification from audio and paralinguistics-aware response generation via a two-stage pipeline. Experiments demonstrate that our approach improves paralinguistics understanding over both supervised baselines and strong proprietary models (Gemini-2.5-Pro, GPT-4o-audio), by 8-12% on Expresso, IEMOCAP, and RAVDESS. The results show that modeling paralinguistic reasoning with multi-task RL is crucial for building emotionally intelligent speech LLMs.

Anthology ID:: 2026.eacl-industry.49
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 636–648
Language:
URL:: https://aclanthology.org/2026.eacl-industry.49/
DOI:
Bibkey:
Cite (ACL):: Minseok Kim, Jingxiang Chen, Seong-Gyun Leem, Yin Huang, Rashi Rungta, Zhicheng Ouyang, Haibin Wu, Surya Teja Appini, Ankur Bansal, Yang Bai, Yue Liu, Florian Metze, Ahmed A Aly, Anuj Kumar, Ariya Rastrow, and Zhaojiang Lin. 2026. Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 636–648, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning (Kim et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.49.pdf

PDF Cite Search Fix data