AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Jiayi Wang; David Ifeoluwa Adelani; Sweta Agrawal; Marek Masiak; Ricardo Rei; Eleftheria Briakou; Marine Carpuat; Xuanli He; Sofia Bourhim; Andiswa Bukula; Muhidin Mohamed; Temitayo Olatoye; Tosin Adewumi; Hamam Mokayed; Christine Mwase; Wangui Kimotho; Foutse Yuehgoh; Anuoluwapo Aremu; Jessica Ojo; Shamsuddeen Hassan Muhammad; Salomey Osei; Abdul-Hakeem Omotayo; Chiamaka Chukwuneke; Perez Ogayo; Oumaima Hourrane; Salma El Anigri; Lolwethu Ndolela; Thabiso Mangwana; Shafie Abdi Mohamed; Hassan Ayinde; Oluwabusayo Olufunke Awoyomi; Lama Alkhaled; Sana Al-Azzawi; Naome A. Etori; Millicent Ochieng; Clemencia Siro; Njoroge Kiragu; Eric Muchiri; Wangari Kimotho; Lyse Naomi Wamba Momo; Daud Abolade; Simbiat Ajao; Iyanuoluwa Shode; Ricky Macharm; Ruqayya Nasir Iro; Saheed S. Abdullahi; Stephen E. Moore; Bernard Opoku; Zainab Akinjobi; Abeeb Afolabi; Nnaemeka Obiefuna; Onyekachi Raphael Ogbu; Sam Ochieng’; Verrah Akinyi Otiende; Chinedu Emmanuel Mbonu; Sakayo Toadoum Sari; Yao Lu; Pontus Stenetorp

doi:10.18653/v1/2024.naacl-long.334

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Jiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal, Marek Masiak, Ricardo Rei, Eleftheria Briakou, Marine Carpuat, Xuanli He, Sofia Bourhim, Andiswa Bukula, Muhidin Mohamed, Temitayo Olatoye, Tosin Adewumi, Hamam Mokayed, Christine Mwase, Wangui Kimotho, Foutse Yuehgoh, Anuoluwapo Aremu, Jessica Ojo, Shamsuddeen Hassan Muhammad, Salomey Osei, Abdul-Hakeem Omotayo, Chiamaka Chukwuneke, Perez Ogayo, Oumaima Hourrane, Salma El Anigri, Lolwethu Ndolela, Thabiso Mangwana, Shafie Abdi Mohamed, Hassan Ayinde, Oluwabusayo Olufunke Awoyomi, Lama Alkhaled, Sana Al-azzawi, Naome A. Etori, Millicent Ochieng, Clemencia Siro, Njoroge Kiragu, Eric Muchiri, Wangari Kimotho, Lyse Naomi Wamba Momo, Daud Abolade, Simbiat Ajao, Iyanuoluwa Shode, Ricky Macharm, Ruqayya Nasir Iro, Saheed S. Abdullahi, Stephen E. Moore, Bernard Opoku, Zainab Akinjobi, Abeeb Afolabi, Nnaemeka Obiefuna, Onyekachi Raphael Ogbu, Sam Ochieng’, Verrah Akinyi Otiende, Chinedu Emmanuel Mbonu, Sakayo Toadoum Sari, Yao Lu, Pontus Stenetorp

Abstract

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

Anthology ID:: 2024.naacl-long.334
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5997–6023
Language:
URL:: https://aclanthology.org/2024.naacl-long.334/
DOI:: 10.18653/v1/2024.naacl-long.334
Bibkey:
Cite (ACL):: Jiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal, Marek Masiak, Ricardo Rei, Eleftheria Briakou, Marine Carpuat, Xuanli He, Sofia Bourhim, Andiswa Bukula, Muhidin Mohamed, Temitayo Olatoye, Tosin Adewumi, Hamam Mokayed, Christine Mwase, Wangui Kimotho, Foutse Yuehgoh, Anuoluwapo Aremu, Jessica Ojo, Shamsuddeen Hassan Muhammad, Salomey Osei, Abdul-Hakeem Omotayo, Chiamaka Chukwuneke, Perez Ogayo, Oumaima Hourrane, Salma El Anigri, Lolwethu Ndolela, Thabiso Mangwana, Shafie Abdi Mohamed, Hassan Ayinde, Oluwabusayo Olufunke Awoyomi, Lama Alkhaled, Sana Al-azzawi, Naome A. Etori, Millicent Ochieng, Clemencia Siro, Njoroge Kiragu, Eric Muchiri, Wangari Kimotho, Lyse Naomi Wamba Momo, Daud Abolade, Simbiat Ajao, Iyanuoluwa Shode, Ricky Macharm, Ruqayya Nasir Iro, Saheed S. Abdullahi, Stephen E. Moore, Bernard Opoku, Zainab Akinjobi, Abeeb Afolabi, Nnaemeka Obiefuna, Onyekachi Raphael Ogbu, Sam Ochieng’, Verrah Akinyi Otiende, Chinedu Emmanuel Mbonu, Sakayo Toadoum Sari, Yao Lu, and Pontus Stenetorp. 2024. AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5997–6023, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages (Wang et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.334.pdf
Video:: https://aclanthology.org/2024.naacl-long.334.mp4

PDF Cite Search Video Fix data