Are Pretrained Multilingual Models Equally Fair across Languages?

Laura Cabello Piqueras, Anders Søgaard


Abstract
Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower-resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt –mBERT, XLM-R, and mT5– and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.
Anthology ID:
2022.coling-1.318
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3597–3605
Language:
URL:
https://aclanthology.org/2022.coling-1.318
DOI:
Bibkey:
Cite (ACL):
Laura Cabello Piqueras and Anders Søgaard. 2022. Are Pretrained Multilingual Models Equally Fair across Languages?. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3597–3605, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Are Pretrained Multilingual Models Equally Fair across Languages? (Cabello Piqueras & Søgaard, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.318.pdf
Code
 coastalcph/mozart