A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

Angus Addlesee, Yanchao Yu, Arash Eshghi


Abstract
Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.
Anthology ID:
2020.coling-main.312
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3492–3503
Language:
URL:
https://aclanthology.org/2020.coling-main.312
DOI:
10.18653/v1/2020.coling-main.312
Bibkey:
Cite (ACL):
Angus Addlesee, Yanchao Yu, and Arash Eshghi. 2020. A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3492–3503, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI (Addlesee et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.312.pdf
Code
 wallscope-research/incremental-asr-processing +  additional community code