Ricky Macharm - ACL Anthology

Ricky Macharm

2024

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang | David Ifeoluwa Adelani | Sweta Agrawal | Marek Masiak | Ricardo Rei | Eleftheria Briakou | Marine Carpuat | Xuanli He | Sofia Bourhim | Andiswa Bukula | Muhidin Mohamed | Temitayo Olatoye | Tosin Adewumi | Hamam Mokayed | Christine Mwase | Wangui Kimotho | Foutse Yuehgoh | Anuoluwapo Aremu | Jessica Ojo | Shamsuddeen Hassan Muhammad | Salomey Osei | Abdul-Hakeem Omotayo | Chiamaka Chukwuneke | Perez Ogayo | Oumaima Hourrane | Salma El Anigri | Lolwethu Ndolela | Thabiso Mangwana | Shafie Abdi Mohamed | Hassan Ayinde | Oluwabusayo Olufunke Awoyomi | Lama Alkhaled | Sana Al-azzawi | Naome A. Etori | Millicent Ochieng | Clemencia Siro | Njoroge Kiragu | Eric Muchiri | Wangari Kimotho | Lyse Naomi Wamba Momo | Daud Abolade | Simbiat Ajao | Iyanuoluwa Shode | Ricky Macharm | Ruqayya Nasir Iro | Saheed S. Abdullahi | Stephen E. Moore | Bernard Opoku | Zainab Akinjobi | Abeeb Afolabi | Nnaemeka Obiefuna | Onyekachi Raphael Ogbu | Sam Ochieng’ | Verrah Akinyi Otiende | Chinedu Emmanuel Mbonu | Sakayo Toadoum Sari | Yao Lu | Pontus Stenetorp
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

2020

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
Wilhelmina Nekoto | Vukosi Marivate | Tshinondiwa Matsila | Timi Fasubaa | Taiwo Fagbohungbe | Solomon Oluwole Akinola | Shamsuddeen Muhammad | Salomon Kabongo Kabenamualu | Salomey Osei | Freshia Sackey | Rubungo Andre Niyongabo | Ricky Macharm | Perez Ogayo | Orevaoghene Ahia | Musie Meressa Berhe | Mofetoluwa Adeyemi | Masabata Mokgesi-Selinga | Lawrence Okegbemi | Laura Martinus | Kolawole Tajudeen | Kevin Degila | Kelechi Ogueji | Kathleen Siminyu | Julia Kreutzer | Jason Webster | Jamiil Toure Ali | Jade Abbott | Iroro Orife | Ignatius Ezeani | Idris Abdulkadir Dangana | Herman Kamper | Hady Elsahar | Goodness Duru | Ghollah Kioko | Murhabazi Espoir | Elan van Biljon | Daniel Whitenack | Christopher Onyefuluchi | Chris Chinenye Emezue | Bonaventure F. P. Dossou | Blessing Sibanda | Blessing Bassey | Ayodele Olabiyi | Arshath Ramkilowan | Alp Öktem | Adewale Akinfaderin | Abdallah Bashir
Findings of the Association for Computational Linguistics: EMNLP 2020

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. ‘Low-resourced’-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github.com/masakhane-io/masakhane-mt.

Co-authors

David Ifeoluwa Adelani 1

Tosin Adewumi 1

Mofetoluwa Adeyemi 1

Abeeb Afolabi 1

Sweta Agrawal 1

Orevaoghene Ahia 1

Adewale Akinfaderin 1

Zainab Akinjobi 1

Solomon Oluwole Akinola 1

Sana Al-Azzawi 1

Jamiil Toure Ali 1

Lama Alkhaled 1

Anuoluwapo Aremu 1

Oluwabusayo Olufunke Awoyomi 1

Hassan Ayinde 1

Abdallah Bashir 1

Blessing Bassey 1

Musie Meressa Berhe 1

Sofia Bourhim 1

Eleftheria Briakou 1

Andiswa Bukula 1

Marine Carpuat 1

Chiamaka Chukwuneke 1

Idris Abdulkadir Dangana 1

Bonaventure F. P. Dossou 1

Goodness Duru 1

Salma El Anigri 1

Chris Chinenye Emezue 1

Murhabazi Espoir 1

Naome A. Etori 1

Ignatius Ezeani 1

Taiwo Fagbohungbe 1

Oumaima Hourrane 1

Ruqayya Nasir Iro 1

Salomon Kabongo Kabenamualu 1

Herman Kamper 1

Wangui Kimotho 1

Wangari Kimotho 1

Ghollah Kioko 1

Njoroge Kiragu 1

Julia Kreutzer 1

Thabiso Mangwana 1

Vukosi Marivate 1

Laura Martinus 1

Tshinondiwa Matsila 1

Chinedu Emmanuel Mbonu 1

Muhidin Mohamed 1

Shafie Abdi Mohamed 1

Hamam Mokayed 1

Masabata Mokgesi-Selinga 1

Stephen E. Moore 1

Christine Mwase 1

Lolwethu Ndolela 1

Wilhelmina Nekoto 1

Rubungo Andre Niyongabo 1

Nnaemeka Obiefuna 1

Millicent Ochieng 1

Sam Ochieng’ 1

Onyekachi Raphael Ogbu 1

Kelechi Ogueji 1

Lawrence Okegbemi 1

Ayodele Olabiyi 1

Temitayo Olatoye 1

Abdul-Hakeem Omotayo 1

Christopher Onyefuluchi 1

Bernard Opoku 1

Verrah Akinyi Otiende 1

Arshath Ramkilowan 1

Freshia Sackey 1

Iyanuoluwa Shode 1

Blessing Kudzaishe Sibanda 1

Kathleen Siminyu 1

Clemencia Siro 1

Pontus Stenetorp 1

Kolawole Tajudeen 1

Sakayo Toadoum Sari 1

Lyse Naomi Wamba Momo 1

Jason Webster 1

Daniel Whitenack 1

Foutse Yuehgoh 1

Elan van Biljon 1

Venues

findings1
naacl1