Evaluating and Improving Child-Directed Automatic Speech Recognition

Eric Booth; Jake Carns; Casey Kennington; Nader Rafla

Evaluating and Improving Child-Directed Automatic Speech Recognition

Eric Booth, Jake Carns, Casey Kennington, Nader Rafla

Abstract

Speech recognition has seen dramatic improvements in the last decade, though those improvements have focused primarily on adult speech. In this paper, we assess child-directed speech recognition and leverage a transfer learning approach to improve child-directed speech recognition by training the recent DeepSpeech2 model on adult data, then apply additional tuning to varied amounts of child speech data. We evaluate our model using the CMU Kids dataset as well as our own recordings of child-directed prompts. The results from our experiment show that even a small amount of child audio data improves significantly over a baseline of adult-only or child-only trained models. We report a final general Word-Error-Rate of 29% over a baseline of 62% that uses the adult-trained model. Our analyses show that our model adapts quickly using a small amount of data and that the general child model works better than school grade-specific models. We make available our trained model and our data collection tool.

Anthology ID:: 2020.lrec-1.778
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 6340–6345
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.778/
DOI:
Bibkey:
Cite (ACL):: Eric Booth, Jake Carns, Casey Kennington, and Nader Rafla. 2020. Evaluating and Improving Child-Directed Automatic Speech Recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6340–6345, Marseille, France. European Language Resources Association.
Cite (Informal):: Evaluating and Improving Child-Directed Automatic Speech Recognition (Booth et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.778.pdf

PDF Cite Search Fix data