Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

Mahault Garnerin, Solange Rossato, Laurent Besacier


Abstract
In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.
Anthology ID:
2021.gebnlp-1.10
Volume:
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | GeBNLP | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–92
Language:
URL:
https://aclanthology.org/2021.gebnlp-1.10
DOI:
10.18653/v1/2021.gebnlp-1.10
Bibkey:
Cite (ACL):
Mahault Garnerin, Solange Rossato, and Laurent Besacier. 2021. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, pages 86–92, Online. Association for Computational Linguistics.
Cite (Informal):
Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech (Garnerin et al., GeBNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gebnlp-1.10.pdf
Data
LibriSpeech