Evaluating and Mitigating Inherent Linguistic Bias of African American English through Inference

Jamell Dacon, Haochen Liu, Jiliang Tang


Abstract
Recent studies show that NLP models trained on standard English texts tend to produce biased outcomes against underrepresented English varieties. In this work, we conduct a pioneering study of the English variety use of African American English (AAE) in NLI task. First, we propose CodeSwitch, a greedy unidirectional morphosyntactically-informed rule-based translation method for data augmentation. Next, we use CodeSwitch to present a preliminary study to determine if demographic language features do in fact influence models to produce false predictions. Then, we conduct experiments on two popular datasets and propose two simple, yet effective and generalizable debiasing methods. Our findings show that NLI models (e.g. BERT) trained under our proposed frameworks outperform traditional large language models while maintaining or even improving the prediction performance. In addition, we intend to release CodeSwitch, in hopes of promoting dialectal language diversity in training data to both reduce the discriminatory societal impacts and improve model robustness of downstream NLP tasks.
Anthology ID:
2022.coling-1.124
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1442–1454
Language:
URL:
https://aclanthology.org/2022.coling-1.124
DOI:
Bibkey:
Cite (ACL):
Jamell Dacon, Haochen Liu, and Jiliang Tang. 2022. Evaluating and Mitigating Inherent Linguistic Bias of African American English through Inference. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1442–1454, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Evaluating and Mitigating Inherent Linguistic Bias of African American English through Inference (Dacon et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.124.pdf
Data
MultiNLISNLI