Hellina Hailu Nigatu
2024
Gender Bias Evaluation in Machine Translation for Amharic, Tigrigna, and Afaan Oromoo
Walelign Sewunetie
|
Atnafu Tonja
|
Tadesse Belay
|
Hellina Hailu Nigatu
|
Gashaw Gebremeskel
|
Zewdie Mossie
|
Hussien Seid
|
Seid Yimam
Proceedings of the 2nd International Workshop on Gender-Inclusive Translation Technologies
While Machine Translation (MT) research has progressed over the years, translation systems still suffer from biases, including gender bias. While an active line of research studies the existence and mitigation strategies of gender bias in machine translation systems, there is limited research exploring this phenomenon for low-resource languages. The limited availability of linguistic and computational resources confounded with the lack of benchmark datasets makes studying bias for low-resourced languages that much more difficult. In this paper, we construct benchmark datasets to evaluate gender bias in machine translation for three low-resource languages: Afaan Oromoo (Orm), Amharic (Amh), and Tigrinya (Tir). Building on prior work, we collected 2400 gender-balanced sentences parallelly translated into the three languages. From human evaluations of the dataset we collected, we found that about 93% of Afaan Oromoo, 80% of Tigrinya, and 72% of Amharic sentences exhibited gender bias. In addition to providing benchmarks for improving gender bias mitigation research in the three languages, we hope the careful documentation of our work will help other low-resourced language researchers extend our approach to their languages.
2023
Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models
Atnafu Lambebo Tonja
|
Hellina Hailu Nigatu
|
Olga Kolesnikova
|
Grigori Sidorov
|
Alexander Gelbukh
|
Jugal Kalita
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
This paper describes CIC NLP’s submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) — Helsinki NLP Spanish-English translation model, and experimented with different transfer learning setups. We experimented with 11 languages from America and report the setups we used as well as the results we achieved. Overall, the mBART setup was able to improve upon the baseline for three out of the eleven languages.
Search
Co-authors
- Walelign Sewunetie 1
- Atnafu Tonja 1
- Tadesse Belay 1
- Gashaw Gebremeskel 1
- Zewdie Mossie 1
- show all...