Hellina Hailu Nigatu


2024

pdf bib
The Zeno’s Paradox of ‘Low-Resource’ Languages
Hellina Hailu Nigatu | Atnafu Lambebo Tonja | Benjamin Rosman | Thamar Solorio | Monojit Choudhury
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The disparity in the languages commonly studied in Natural Language Processing (NLP) is typically reflected by referring to languages as low vs high-resourced. However, there is limited consensus on what exactly qualifies as a ‘low-resource language.’ To understand how NLP papers define and study ‘low resource’ languages, we qualitatively analyzed 150 papers from the ACL Anthology and popular speech-processing conferences that mention the keyword ‘low-resource.’ Based on our analysis, we show how several interacting axes contribute to ‘low-resourcedness’ of a language and why that makes it difficult to track progress for each individual language. We hope our work (1) elicits explicit definitions of the terminology when it is used in papers and (2) provides grounding for the different axes to consider when connoting a language as low-resource.

pdf bib
Gender Bias Evaluation in Machine Translation for Amharic, Tigrigna, and Afaan Oromoo
Walelign Sewunetie | Atnafu Tonja | Tadesse Belay | Hellina Hailu Nigatu | Gashaw Gebremeskel | Zewdie Mossie | Hussien Seid | Seid Yimam
Proceedings of the 2nd International Workshop on Gender-Inclusive Translation Technologies

While Machine Translation (MT) research has progressed over the years, translation systems still suffer from biases, including gender bias. While an active line of research studies the existence and mitigation strategies of gender bias in machine translation systems, there is limited research exploring this phenomenon for low-resource languages. The limited availability of linguistic and computational resources confounded with the lack of benchmark datasets makes studying bias for low-resourced languages that much more difficult. In this paper, we construct benchmark datasets to evaluate gender bias in machine translation for three low-resource languages: Afaan Oromoo (Orm), Amharic (Amh), and Tigrinya (Tir). Building on prior work, we collected 2400 gender-balanced sentences parallelly translated into the three languages. From human evaluations of the dataset we collected, we found that about 93% of Afaan Oromoo, 80% of Tigrinya, and 72% of Amharic sentences exhibited gender bias. In addition to providing benchmarks for improving gender bias mitigation research in the three languages, we hope the careful documentation of our work will help other low-resourced language researchers extend our approach to their languages.

pdf bib
Proceedings of the Eighth Widening NLP Workshop
Atnafu Lambebo Tonja | Alfredo Gomez | Chanjun Park | Hellina Hailu Nigatu | Santosh T.Y.S.S | Tanvi Anand | Wiem Ben Rim
Proceedings of the Eighth Widening NLP Workshop

2023

pdf bib
Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models
Atnafu Lambebo Tonja | Hellina Hailu Nigatu | Olga Kolesnikova | Grigori Sidorov | Alexander Gelbukh | Jugal Kalita
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)

This paper describes CIC NLP’s submission to the AmericasNLP 2023 Shared Task on machine translation systems for indigenous languages of the Americas. We present the system descriptions for three methods. We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) — Helsinki NLP Spanish-English translation model, and experimented with different transfer learning setups. We experimented with 11 languages from America and report the setups we used as well as the results we achieved. Overall, the mBART setup was able to improve upon the baseline for three out of the eleven languages.