Multilingual and Continuous Backchannel Prediction: A Cross-lingual Study

Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Taiga Mori, Divesh Lala, Keiko Ochi, Tatsuya Kawahara


Abstract
We present a multilingual, continuous backchannel prediction model for Japanese, English, and Chinese, and use it to investigate cross-linguistic timing behavior. The model is Transformer-based and operates at the frame level, jointly trained with auxiliary tasks on approximately 300 hours of dyadic conversations. Across all three languages, the multilingual model matches or surpasses monolingual baselines, indicating that it learns both language-universal cues and language-specific timing patterns. Zero-shot transfer with two-language training remains limited, underscoring substantive cross-lingual differences. Perturbation analyses reveal distinct cue usage: Japanese relies more on short-term linguistic information, whereas English and Chinese are more sensitive to silence duration and prosodic variation; multilingual training encourages shared yet adaptable representations and reduces overreliance on pitch in Chinese. A context-length study further shows that Japanese is relatively robust to shorter contexts, while Chinese benefits markedly from longer contexts. Finally, we integrate the trained model into a real-time processing software, demonstrating CPU-only inference. Together, these findings provide a unified model and empirical evidence for how backchannel timing differs across languages, informing the design of more natural, culturally-aware spoken dialogue systems.
Anthology ID:
2026.iwsds-1.23
Volume:
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Month:
February
Year:
2026
Address:
Trento, Italy
Editors:
Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
Venue:
IWSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
222–230
Language:
URL:
https://aclanthology.org/2026.iwsds-1.23/
DOI:
Bibkey:
Cite (ACL):
Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Taiga Mori, Divesh Lala, Keiko Ochi, and Tatsuya Kawahara. 2026. Multilingual and Continuous Backchannel Prediction: A Cross-lingual Study. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 222–230, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):
Multilingual and Continuous Backchannel Prediction: A Cross-lingual Study (Inoue et al., IWSDS 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwsds-1.23.pdf