A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Yi Zhou; Jose Camacho-Collados; Danushka Bollegala

doi:10.18653/v1/2023.emnlp-main.683

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Yi Zhou, Jose Camacho-Collados, Danushka Bollegala

Abstract

Various types of social biases have been reported with pretrained Masked Language Models (MLMs) in prior work. However, multiple underlying factors are associated with an MLM such as its model size, size of the training data, training objectives, the domain from which pretraining data is sampled, tokenization, and languages present in the pretrained corpora, to name a few. It remains unclear as to which of those factors influence social biases that are learned by MLMs. To study the relationship between model factors and the social biases learned by an MLM, as well as the downstream task performance of the model, we conduct a comprehensive study over 39 pretrained MLMs covering different model sizes, training objectives, tokenization methods, training data domains and languages. Our results shed light on important factors often neglected in prior literature, such as tokenization or model objectives.

Anthology ID:: 2023.emnlp-main.683
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11082–11100
Language:
URL:: https://aclanthology.org/2023.emnlp-main.683/
DOI:: 10.18653/v1/2023.emnlp-main.683
Bibkey:
Cite (ACL):: Yi Zhou, Jose Camacho-Collados, and Danushka Bollegala. 2023. A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11082–11100, Singapore. Association for Computational Linguistics.
Cite (Informal):: A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models (Zhou et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.683.pdf
Video:: https://aclanthology.org/2023.emnlp-main.683.mp4

PDF Cite Search Video Fix data