Does BERT Exacerbate Gender or L1 Biases in Automated English Speaking Assessment?

Alexander Kwako, Yixin Wan, Jieyu Zhao, Mark Hansen, Kai-Wei Chang, Li Cai


Abstract
In English speaking assessment, pretrained large language models (LLMs) such as BERT can score constructed response items as accurately as human raters. Less research has investigated whether LLMs perpetuate or exacerbate biases, which would pose problems for the fairness and validity of the test. This study examines gender and native language (L1) biases in human and automated scores, using an off-the-shelf (OOS) BERT model. Analyses focus on a specific type of bias known as differential item functioning (DIF), which compares examinees of similar English language proficiency. Results show that there is a moderate amount of DIF, based on examinees’ L1 background in grade band 912. DIF is higher when scored by an OOS BERT model, indicating that BERT may exacerbate this bias; however, in practical terms, the degree to which BERT exacerbates DIF is very small. Additionally, there is more DIF for longer speaking items and for older examinees, but BERT does not exacerbate these patterns of DIF.
Anthology ID:
2023.bea-1.54
Volume:
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
668–681
Language:
URL:
https://aclanthology.org/2023.bea-1.54
DOI:
10.18653/v1/2023.bea-1.54
Bibkey:
Cite (ACL):
Alexander Kwako, Yixin Wan, Jieyu Zhao, Mark Hansen, Kai-Wei Chang, and Li Cai. 2023. Does BERT Exacerbate Gender or L1 Biases in Automated English Speaking Assessment?. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 668–681, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Does BERT Exacerbate Gender or L1 Biases in Automated English Speaking Assessment? (Kwako et al., BEA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bea-1.54.pdf
Video:
 https://aclanthology.org/2023.bea-1.54.mp4