UniCoM: A Universal Code-Switching Speech Generator

Sangmin Lee, Woojin Chung, Seyun Um, Hong-Goo Kang


Abstract
Code-switching (CS), the alternation between two or more languages within a single speaker’s utterances, is common in real-world conversations and poses significant challenges for multilingual speech technology. However, systems capable of handling this phenomenon remain underexplored, primarily due to the scarcity of suitable datasets. To resolve this issue, we propose Universal Code-Mixer (UniCoM), a novel pipeline for generating high-quality, natural CS samples without altering sentence semantics. Our approach utilizes an algorithm we call Substituting WORDs with Synonyms (SWORDS), which generates CS speech by replacing selected words with their translations while considering their parts of speech. Using UniCoM, we construct Code-Switching FLEURS (CS-FLEURS), a multilingual CS corpus designed for automatic speech recognition (ASR) and speech-to-text translation (S2TT). Experimental results show that CS-FLEURS achieves high intelligibility and naturalness, performing comparably to existing datasets on both objective and subjective metrics. We expect our approach to advance CS speech technology and enable more inclusive multilingual systems.
Anthology ID:
2025.findings-emnlp.715
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13273–13288
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.715/
DOI:
Bibkey:
Cite (ACL):
Sangmin Lee, Woojin Chung, Seyun Um, and Hong-Goo Kang. 2025. UniCoM: A Universal Code-Switching Speech Generator. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 13273–13288, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
UniCoM: A Universal Code-Switching Speech Generator (Lee et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.715.pdf
Checklist:
 2025.findings-emnlp.715.checklist.pdf