Gender and Dialect Bias in YouTube’s Automatic Captions

Rachael Tatman


Abstract
This project evaluates the accuracy of YouTube’s automatically-generated captions across two genders and five dialect groups. Speakers’ dialect and gender was controlled for by using videos uploaded as part of the “accent tag challenge”, where speakers explicitly identify their language background. The results show robust differences in accuracy across both gender and dialect, with lower accuracy for 1) women and 2) speakers from Scotland. This finding builds on earlier research finding that speaker’s sociolinguistic identity may negatively impact their ability to use automatic speech recognition, and demonstrates the need for sociolinguistically-stratified validation of systems.
Anthology ID:
W17-1606
Volume:
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
Month:
April
Year:
2017
Address:
Valencia, Spain
Venues:
EthNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–59
Language:
URL:
https://aclanthology.org/W17-1606
DOI:
10.18653/v1/W17-1606
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W17-1606.pdf