Camille Harris


2024

pdf bib
Modeling Gender and Dialect Bias in Automatic Speech Recognition
Camille Harris | Chijioke Mgbahurike | Neha Kumar | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2024

Dialect and gender-based biases have become an area of concern in language-dependent AI systemsincluding around automatic speech recognition (ASR) which processes speech audio into text. These potential biases raise concern for discriminatory outcomes with AI systems depending on demographic- particularly gender discrimination against women, and racial discrimination against minorities with ethnic or cultural English dialects.As such we aim to evaluate the performance of ASR systems across different genders and across dialects of English. Concretely, we take a deep dive of the performance of ASR systems on men and women across four US-based English dialects: Standard American English (SAE), African American Vernacular English (AAVE), Chicano English, and Spanglish. To do this, we construct a labeled dataset of 13 hours of podcast audio, transcribed by speakers of the represented dialects. We then evaluate zero-shot performance of different automatic speech recognition models on our dataset, and further finetune models to better understand how finetuning can impact performance. Our work fills the gap of investigating possible gender disparities within underrepresented dialects.

2022

pdf bib
VALUE: Understanding Dialect Disparity in NLU
Caleb Ziems | Jiaao Chen | Camille Harris | Jessica Anderson | Diyi Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance.