Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting

Christian Huber, Juan Hussain, Tuan-Nam Nguyen, Kaihang Song, Sebastian Stüker, Alexander Waibel


Abstract
When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections.
Anthology ID:
2020.lifelongnlp-1.2
Volume:
Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
William M. Campbell, Alex Waibel, Dilek Hakkani-Tur, Timothy J. Hazen, Kevin Kilgour, Eunah Cho, Varun Kumar, Hadrien Glaude
Venue:
lifelongnlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–17
Language:
URL:
https://aclanthology.org/2020.lifelongnlp-1.2
DOI:
Bibkey:
Cite (ACL):
Christian Huber, Juan Hussain, Tuan-Nam Nguyen, Kaihang Song, Sebastian Stüker, and Alexander Waibel. 2020. Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pages 9–17, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting (Huber et al., lifelongnlp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lifelongnlp-1.2.pdf
Data
How2