Speech-Integrated Modeling for Behavioral Coding in Counseling

Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea


Abstract
Computational models of psychotherapy often ignore vocal cues by relying solely on text. To address this, we propose MISQ, a framework that integrates speech features directly into language models using a speech encoder and lightweight adapter. MISQ improves behavioral analysis in counseling conversations, achieving ~5% relative gains over text-only or indirect speech methods—underscoring the value of vocal signals like tone and prosody.
Anthology ID:
2025.sigdial-1.10
Volume:
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
August
Year:
2025
Address:
Avignon, France
Editors:
Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:
SIGDIAL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–158
Language:
URL:
https://aclanthology.org/2025.sigdial-1.10/
DOI:
Bibkey:
Cite (ACL):
Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, and Rada Mihalcea. 2025. Speech-Integrated Modeling for Behavioral Coding in Counseling. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 152–158, Avignon, France. Association for Computational Linguistics.
Cite (Informal):
Speech-Integrated Modeling for Behavioral Coding in Counseling (Min et al., SIGDIAL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.sigdial-1.10.pdf