Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing

Olha Nahurna; Mariana Romanyshyn

doi:10.18653/v1/2025.unlp-1.16

Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing

Abstract

This paper presents a pipeline for generating gender-balanced datasets through sentence-level gender swapping, addressing the gender-imbalance issue in Ukrainian texts. We select sentences with gender-marked entities, focusing on job titles, generate their inverted alternatives using LLMs and human-in-the-loop, and fine-tune Aya-101 on the resulting dataset for the task of gender swapping. Additionally, we train a Named Entity Recognition (NER) model on gender-balanced data, demonstrating its ability to better recognize gendered entities. The findings unveil the potential of gender-balanced datasets to enhance model robustness and support more fair language processing. Finally, we make a gender-swapped version of NER-UK~2.0 and the fine-tuned Aya-101 model available for download and further research.

Anthology ID:: 2025.unlp-1.16
Volume:: Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (online)
Editor:: Mariana Romanyshyn
Venues:: UNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 147–161
Language:
URL:: https://aclanthology.org/2025.unlp-1.16/
DOI:: 10.18653/v1/2025.unlp-1.16
Bibkey:
Cite (ACL):: Olha Nahurna and Mariana Romanyshyn. 2025. Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 147–161, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):: Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing (Nahurna & Romanyshyn, UNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.unlp-1.16.pdf

PDF Cite Search Fix data