User-Centric Gender Rewriting

Bashar Alhafni, Nizar Habash, Houda Bouamor


Abstract
In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) – first and second grammatical persons with independent grammatical gender preferences. We focus on Arabic, a gender-marking morphologically rich language. We develop a multi-step system that combines the positive aspects of both rule-based and neural rewriting models. Our results successfully demonstrate the viability of this approach on a recently created corpus for Arabic gender rewriting, achieving 88.42 M2 F0.5 on a blind test set. Our proposed system improves over previous work on the first-person-only version of this task, by 3.05 absolute increase in M2 F0.5. We demonstrate a use case of our gender rewriting system by using it to post-edit the output of a commercial MT system to provide personalized outputs based on the users’ grammatical gender preferences. We make our code, data, and pretrained models publicly available.
Anthology ID:
2022.naacl-main.46
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
618–631
Language:
URL:
https://aclanthology.org/2022.naacl-main.46
DOI:
10.18653/v1/2022.naacl-main.46
Bibkey:
Cite (ACL):
Bashar Alhafni, Nizar Habash, and Houda Bouamor. 2022. User-Centric Gender Rewriting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 618–631, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
User-Centric Gender Rewriting (Alhafni et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.46.pdf
Code
 camel-lab/gender-rewriting
Data
OpenSubtitles