Olanrewaju Samuel - ACL Anthology

Olanrewaju Samuel

2024

A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We present the largest cumulative dataset to date for Creole language MT, including 14.5M unique Creole sentences with parallel translations—11.6M of which we release publicly, and the largest bitexts gathered to date for 41 languages—the first ever for 21. In addition, we provide MT models supporting all 41 Creole languages in 172 translation directions. Given our diverse dataset, we produce a model for Creole language MT exposed to more genre diversity then ever before, which outperforms a genre-specific Creole MT model on its own benchmark for 23 of 34 translation directions.

2023

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
Cheikh M. Bamba Dione | David Ifeoluwa Adelani | Peter Nabende | Jesujoba Alabi | Thapelo Sindane | Happy Buzaaba | Shamsuddeen Hassan Muhammad | Chris Chinenye Emezue | Perez Ogayo | Anuoluwapo Aremu | Catherine Gitau | Derguene Mbaye | Jonathan Mukiibi | Blessing Sibanda | Bonaventure F. P. Dossou | Andiswa Bukula | Rooweither Mabuya | Allahsera Auguste Tapo | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Fatoumata Ouoba Kabore | Amelia Taylor | Godson Kalipe | Tebogo Macucwa | Vukosi Marivate | Tajuddeen Gwadabe | Mboning Tchiaze Elvis | Ikechukwu Onyenwe | Gratien Atindogbe | Tolulope Adelani | Idris Akinade | Olanrewaju Samuel | Marien Nahimana | Théogène Musabeyezu | Emile Niyomutabazi | Ester Chimhenga | Kudzai Gotosa | Patrick Mizha | Apelete Agbolo | Seydou Traore | Chinedu Uchechukwu | Aliyu Yusuf | Muhammad Abdullahi | Dietrich Klakow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.

MasakhaNEWS: News Topic Classification for African languages
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Co-authors

Chris Chinenye Emezue 2

Tajuddeen Gwadabe 2

Shamsuddeen Hassan Muhammad 2

Jonathan Mukiibi 2

Blessing Kudzaishe Sibanda 2

Teshome Mulugeta Ababu 1

Habiba Abdulganiyu 1

Muhammad Abdullahi 1

Idris Abdulmumin 1

Adetola Adeeko 1

Abeeb Afolabi 1

Apelete Agbolo 1

Idris Akinade 1

Sana Al-Azzawi 1

Gratien Atindogbe 1

Oyinkansola Awosan 1

Oluwabusayo Awoyomi 1

Israel Abebe Azime 1

Bismarck Bamfo Odoom 1

Claire Bizon Monroc 1

Andiswa Bukula 1

Happy Buzaaba 1

Ester Chimhenga 1

Chiamaka Chukwuneke 1

Cheikh M. Bamba Dione 1

Mboning Tchiaze Elvis 1

Naome A. Etori 1

Samuel Fanijo 1

Catherine Gitau 1

Kudzai Gotosa 1

Morgan Grobol 1

Fuad Mire Hassan 1

Abdulmejid Johar 1

Godson Kalipe 1

Tadesse Kebede 1

Sanjeev Khudanpur 1

Ussen Kimanuka 1

Wangari Kimotho 1

Dietrich Klakow 1

Rooweither Mabuya 1

Tebogo Macucwa 1

Vukosi Marivate 1

Derguene Mbaye 1

Chinedu Mbonu 1

Moges Ahmed Mehamed 1

Victoire Memdjokam Koagne 1

Patrick Mizha 1

Muhidin Mohamed 1

Shafie Mohamed 1

Tatiana Moteu 1

Hasan Muhammad 1

Edwin Munkoh-Buabeng 1

Kenton Murray 1

Théogène Musabeyezu 1

Christine Mwase 1

Peter Nabende 1

Marien Nahimana 1

Lolwethu Ndolela 1

Evrard Ngabire 1

Sinodos Nigusse 1

Doreen Nixdorf 1

Emile Niyomutabazi 1

Siyanda Nxakama 1

Pamela Nyatsine 1

Nnaemeka Obiefuna 1

Brian Odhiambo 1

Mardiyyah Oduwole 1

Onyekachi Ogbu 1

Odunayo Ogundepo 1

Akintunde Oladipo 1

Abdul-Hakeem Omotayo 1

Onenamiyi Onesi 1

Ikechukwu Onyenwe 1

Fatoumata Ouoba Kabore 1

Abraham Toluwase Owodunni 1

Stephen D. Richardson 1

Nathaniel R. Robinson 1

Toadoum Sari Sakayo 1

Saheed Abdullahi Salahudeen 1

Iyanuoluwa Shode 1

Freedmore Sidume 1

Thapelo Sindane 1

Clemencia Siro 1

Ivan Ssenkungu 1

Pontus Stenetorp 1

Matthew Dean Stutzman 1

Allahsera Auguste Tapo 1

Amelia Taylor 1

Vijay Murari Tiyyala 1

Atnafu Lambebo Tonja 1

Seydou Traore 1

Chinedu Uchechukwu 1

Mesay Gemeda Yigezu 1

Venues