Generative Approach for Gender-Rewriting Task with ArabicT5

Sultan Alrowili, Vijay Shanker


Abstract
Addressing the correct gender in generative tasks (e.g., Machine Translation) has been an overlooked issue in the Arabic NLP. However, the recent introduction of the Arabic Parallel Gender Corpus (APGC) dataset has established new baselines for the Arabic Gender Rewriting task. To address the Gender Rewriting task, we first pre-train our new Seq2Seq ArabicT5 model on a 17GB of Arabic Corpora. Then, we continue pre-training our ArabicT5 model on the APGC dataset using a newly proposed method. Our evaluation shows that our ArabicT5 model, when trained on the APGC dataset, achieved competitive results against existing state-of-the-art methods. In addition, our ArabicT5 model shows better results on the APGC dataset compared to other Arabic and multilingual T5 models.
Anthology ID:
2022.wanlp-1.55
Volume:
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
491–495
Language:
URL:
https://aclanthology.org/2022.wanlp-1.55
DOI:
10.18653/v1/2022.wanlp-1.55
Bibkey:
Cite (ACL):
Sultan Alrowili and Vijay Shanker. 2022. Generative Approach for Gender-Rewriting Task with ArabicT5. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 491–495, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Generative Approach for Gender-Rewriting Task with ArabicT5 (Alrowili & Shanker, WANLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wanlp-1.55.pdf