Gilles Hacheme - ACL Anthology

Gilles Hacheme

2023

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo | Tajuddeen R. Gwadabe | Clara E. Rivera | Jonathan H. Clark | Sebastian Ruder | David Ifeoluwa Adelani | Bonaventure F. P. Dossou | Abdou Aziz Diop | Claytone Sikasote | Gilles Hacheme | Happy Buzaaba | Ignatius Ezeani | Rooweither Mabuya | Salomey Osei | Chris Emezue | Albert Njoroge Kahira | Shamsuddeen Hassan Muhammad | Akintunde Oladipo | Abraham Toluwase Owodunni | Atnafu Lambebo Tonja | Iyanuoluwa Shode | Akari Asai | Tunde Oluwaseyi Ajayi | Clemencia Siro | Steven Arthur | Mofetoluwa Adeyemi | Orevaoghene Ahia | Anuoluwapo Aremu | Oyinkansola Awosan | Chiamaka Chukwuneke | Bernard Opoku | Awokoya Ayodele | Verrah Otiende | Christine Mwase | Boyd Sinkala | Andre Niyongabo Rubungo | Daniel A. Ajisafe | Emeka Felix Onwuegbuzia | Habib Mbow | Emile Niyomutabazi | Eunice Mukonde | Falalu Ibrahim Lawan | Ibrahim Said Ahmad | Jesujoba O. Alabi | Martin Namukombo | Mbonu Chinedu | Mofya Phiri | Neo Putini | Ndumiso Mngoma | Priscilla A. Amouk | Ruqayya Nasir Iro | Sonia Adhiambo
Findings of the Association for Computational Linguistics: EMNLP 2023

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

2022

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani | Jesujoba Oluwadara Alabi | Angela Fan | Julia Kreutzer | Xiaoyu Shen | Machel Reid | Dana Ruiter | Dietrich Klakow | Peter Nabende | Ernie Chang | Tajuddeen Gwadabe | Freshia Sackey | Bonaventure F. P. Dossou | Chris Emezue | Colin Leong | Michael Beukman | Shamsuddeen H. Muhammad | Guyo D. Jarso | Oreen Yousuf | Andre N. Niyongabo Rubungo | Gilles Hacheme | Eric Peter Wairagala | Muhammad Umair Nasir | Benjamin A. Ajibade | Tunde Oluwaseyi Ajayi | Yvonne Wambui Gitau | Jade Abbott | Mohamed Ahmed | Millicent Ochieng | Anuoluwapo Aremu | Perez Ogayo | Jonathan Mukiibi | Fatoumata Ouoba Kabore | Godson Koffi Kalipe | Derguene Mbaye | Allahsera Auguste Tapo | Victoire M. Memdjokam Koagne | Edwin Munkoh-Buabeng | Valencia Wagner | Idris Abdulmumin | Ayodele Awokoya | Happy Buzaaba | Blessing Sibanda | Andiswa Bukula | Sam Manthalu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent advances in the pre-training for language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages that are not well represented on the web and therefore excluded from the large-scale crawls for datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pretraining? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a novel African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both additional languages and additional domains is to leverage small quantities of high-quality translation data to fine-tune large pre-trained models.

Co-authors

Bonaventure F. P. Dossou 2

Chris Chinenye Emezue 2

Shamsuddeen Hassan Muhammad 2

Idris Abdulmumin 1

Mofetoluwa Adeyemi 1

Sonia Adhiambo 1

Orevaoghene Ahia 1

Ibrahim Said Ahmad 1

Mohamed Ahmed 1

Benjamin A. Ajibade 1

Daniel A. Ajisafe 1

Priscilla A. Amouk 1

Steven Arthur 1

Ayodele Awokoya 1

Oyinkansola Awosan 1

Awokoya Ayodele 1

Michael Beukman 1

Andiswa Bukula 1

Mbonu Chinedu 1

Chiamaka Chukwuneke 1

Jonathan H. Clark 1

Abdou Aziz Diop 1

Ignatius Ezeani 1

Yvonne Wambui Gitau 1

Tajuddeen Gwadabe 1

Tajuddeen R. Gwadabe 1

Ruqayya Nasir Iro 1

Guyo D. Jarso 1

Albert Njoroge Kahira 1

Godson Koffi Kalipe 1

Dietrich Klakow 1

Julia Kreutzer 1

Falalu Ibrahim Lawan 1

Rooweither Mabuya 1

Derguene Mbaye 1

Victoire M. Memdjokam Koagne 1

Ndumiso Mngoma 1

Jonathan Mukiibi 1

Eunice Mukonde 1

Edwin Munkoh-Buabeng 1

Christine Mwase 1

Peter Nabende 1

Martin Namukombo 1

Muhammad Umair Nasir 1

Emile Niyomutabazi 1

Andre N. Niyongabo Rubungo 1

Millicent Ochieng 1

Odunayo Ogundepo 1

Akintunde Oladipo 1

Emeka Felix Onwuegbuzia 1

Bernard Opoku 1

Verrah Akinyi Otiende 1

Fatoumata Ouoba Kabore 1

Abraham Toluwase Owodunni 1

Clara E. Rivera 1

Andre Niyongabo Rubungo 1

Sebastian Ruder 1

Freshia Sackey 1

Iyanuoluwa Shode 1

Blessing Kudzaishe Sibanda 1

Claytone Sikasote 1

Clemencia Siro 1

Allahsera Auguste Tapo 1

Atnafu Lambebo Tonja 1

Valencia Wagner 1

Eric Peter Wairagala 1

Venues

Findings1
NAACL1