Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Kyudan Jung; Jihwan Kim; Soyoon Kim; Jeonghoon Kim; Jaegul Choo; Cheonbok Park

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Kyudan Jung, Jihwan Kim, Soyoon Kim, Jeonghoon Kim, Jaegul Choo, Cheonbok Park

Abstract

As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction.However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conversational data, as existing large-scale resources are predominantly single-speaker or limited in volume.Addressing the complex dynamics of natural dialogue, such as overlapping and back-channeling remains a challenge, with standard processing pipelines suffering from diarization errors and ASR hallucinations.To bridge this gap, we present a robust and scalable open-source data processing pipeline designed for full-duplex model.Our code and project page are publicly available at https://anonymous-2001-j.github.io/sommelier.github.io/.

Anthology ID:: 2026.acl-industry.18
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Yunyao Li, Georg Rehm, Mei Tu
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 259–284
Language:
URL:: https://aclanthology.org/2026.acl-industry.18/
DOI:
Bibkey:
Cite (ACL):: Kyudan Jung, Jihwan Kim, Soyoon Kim, Jeonghoon Kim, Jaegul Choo, and Cheonbok Park. 2026. Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 259–284, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models (Jung et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-industry.18.pdf

PDF Cite Search Fix data