Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces

Tuomas Kaseva, Hemant Kumar Kathania, Aku Rouhe, Mikko Kurimo


Abstract
For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children’s speech. This is due to the acoustic mismatch between training and testing data. To capture more acoustic variability we trained a shared system with mixed data from adults and children. The shared system yields the best EER for children with no degradation for adults. Thus, the single system trained with mixed data is applicable for speaker verification for both adults and children.
Anthology ID:
2021.nodalida-main.9
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
86–93
Language:
URL:
https://aclanthology.org/2021.nodalida-main.9
DOI:
Bibkey:
Cite (ACL):
Tuomas Kaseva, Hemant Kumar Kathania, Aku Rouhe, and Mikko Kurimo. 2021. Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 86–93, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Speaker Verification Experiments for Adults and Children Using Shared Embedding Spaces (Kaseva et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.9.pdf
Data
VoxCeleb1VoxCeleb2