Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Philipp Klumpp, Tomas Arias, Paula Andrea Pérez-Toro, Elmar Noeth, Juan Orozco-Arroyave


Abstract
Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is not necessarily large in size, but large with respect to the amount of unique speakers, utilized hardware and varying recording conditions. This enables a machine learning model to explore as much of the domain-specific input space as possible during parameter estimation. This work introduces Common Phone, a gender-balanced, multilingual corpus recorded from more than 76.000 contributors via Mozilla’s Common Voice project. It comprises around 116 hours of speech enriched with automatically generated phonetic segmentation. A Wav2Vec 2.0 acoustic model was trained with the Common Phone to perform phonetic symbol recognition and validate the quality of the generated phonetic annotation. The architecture achieved a PER of 18.1 % on the entire test set, computed with all 101 unique phonetic symbols, showing slight differences between the individual languages. We conclude that Common Phone provides sufficient variability and reliable phonetic annotation to help bridging the gap between research and application of acoustic models.
Anthology ID:
2022.lrec-1.81
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
763–768
Language:
URL:
https://aclanthology.org/2022.lrec-1.81
DOI:
Bibkey:
Cite (ACL):
Philipp Klumpp, Tomas Arias, Paula Andrea Pérez-Toro, Elmar Noeth, and Juan Orozco-Arroyave. 2022. Common Phone: A Multilingual Dataset for Robust Acoustic Modelling. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 763–768, Marseille, France. European Language Resources Association.
Cite (Informal):
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling (Klumpp et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.81.pdf
Data
Common PhoneCommon Voice