Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation

Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček


Anthology ID:
2023.ijcnlp-main.57
Volume:
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
November
Year:
2023
Address:
Nusa Dua, Bali
Editors:
Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, Adila Alfa Krisnadhi
Venues:
IJCNLP | AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
885–896
Language:
URL:
https://aclanthology.org/2023.ijcnlp-main.57
DOI:
10.18653/v1/2023.ijcnlp-main.57
Bibkey:
Cite (ACL):
Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, and David Mareček. 2023. Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 885–896, Nusa Dua, Bali. Association for Computational Linguistics.
Cite (Informal):
Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation (Iluz et al., IJCNLP-AACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ijcnlp-main.57.pdf
Software:
 2023.ijcnlp-main.57.Software.zip
Dataset:
 2023.ijcnlp-main.57.Dataset.zip