The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems

Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse


Abstract
Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
Anthology ID:
2023.sigdial-1.45
Volume:
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
September
Year:
2023
Address:
Prague, Czechia
Editors:
Svetlana Stoyanchev, Shafiq Joty, David Schlangen, Ondrej Dusek, Casey Kennington, Malihe Alikhani
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
482–495
Language:
URL:
https://aclanthology.org/2023.sigdial-1.45
DOI:
10.18653/v1/2023.sigdial-1.45
Bibkey:
Cite (ACL):
Andreas Liesenfeld, Alianda Lopez, and Mark Dingemanse. 2023. The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 482–495, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):
The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems (Liesenfeld et al., SIGDIAL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigdial-1.45.pdf