Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, Abdou Diop, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Kahira, Shamsuddeen Muhammad, Akintunde Oladipo, Abraham Owodunni, Atnafu Tonja, Iyanuoluwa Shode, Akari Asai, Anuoluwapo Aremu, Ayodele Awokoya, Bernard Opoku, Chiamaka Chukwuneke, Christine Mwase, Clemencia Siro, Stephen Arthur, Tunde Ajayi, Verrah Otiende, Andre Rubungo, Boyd Sinkala, Daniel Ajisafe, Emeka Onwuegbuzia, Falalu Lawan, Ibrahim Ahmad, Jesujoba Alabi, Chinedu Mbonu, Mofetoluwa Adeyemi, Mofya Phiri, Orevaoghene Ahia, Ruqayya Iro, Sonia Adhiambo
Abstract
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.- Anthology ID:
- 2023.findings-emnlp.997
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14957–14972
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.997
- DOI:
- 10.18653/v1/2023.findings-emnlp.997
- Bibkey:
- Cite (ACL):
- Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, Abdou Diop, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Kahira, Shamsuddeen Muhammad, Akintunde Oladipo, Abraham Owodunni, et al.. 2023. Cross-lingual Open-Retrieval Question Answering for African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14957–14972, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Cross-lingual Open-Retrieval Question Answering for African Languages (Ogundepo et al., Findings 2023)
- Copy Citation:
- PDF:
- https://aclanthology.org/2023.findings-emnlp.997.pdf
Export citation
@inproceedings{ogundepo-etal-2023-cross, title = "Cross-lingual Open-Retrieval Question Answering for {A}frican Languages", author = "Ogundepo, Odunayo and Gwadabe, Tajuddeen and Rivera, Clara and Clark, Jonathan and Ruder, Sebastian and Adelani, David and Dossou, Bonaventure and Diop, Abdou and Sikasote, Claytone and Hacheme, Gilles and Buzaaba, Happy and Ezeani, Ignatius and Mabuya, Rooweither and Osei, Salomey and Emezue, Chris and Kahira, Albert and Muhammad, Shamsuddeen and Oladipo, Akintunde and Owodunni, Abraham and Tonja, Atnafu and Shode, Iyanuoluwa and Asai, Akari and Aremu, Anuoluwapo and Awokoya, Ayodele and Opoku, Bernard and Chukwuneke, Chiamaka and Mwase, Christine and Siro, Clemencia and Arthur, Stephen and Ajayi, Tunde and Otiende, Verrah and Rubungo, Andre and Sinkala, Boyd and Ajisafe, Daniel and Onwuegbuzia, Emeka and Lawan, Falalu and Ahmad, Ibrahim and Alabi, Jesujoba and Mbonu, Chinedu and Adeyemi, Mofetoluwa and Phiri, Mofya and Ahia, Orevaoghene and Iro, Ruqayya and Adhiambo, Sonia", editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-emnlp.997", doi = "10.18653/v1/2023.findings-emnlp.997", pages = "14957--14972", abstract = "African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems {--} those that retrieve answer content from other languages while serving people in their native language{---}offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="ogundepo-etal-2023-cross"> <titleInfo> <title>Cross-lingual Open-Retrieval Question Answering for African Languages</title> </titleInfo> <name type="personal"> <namePart type="given">Odunayo</namePart> <namePart type="family">Ogundepo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tajuddeen</namePart> <namePart type="family">Gwadabe</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Clara</namePart> <namePart type="family">Rivera</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jonathan</namePart> <namePart type="family">Clark</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sebastian</namePart> <namePart type="family">Ruder</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">David</namePart> <namePart type="family">Adelani</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bonaventure</namePart> <namePart type="family">Dossou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Abdou</namePart> <namePart type="family">Diop</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Claytone</namePart> <namePart type="family">Sikasote</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Gilles</namePart> <namePart type="family">Hacheme</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Happy</namePart> <namePart type="family">Buzaaba</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ignatius</namePart> <namePart type="family">Ezeani</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rooweither</namePart> <namePart type="family">Mabuya</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Salomey</namePart> <namePart type="family">Osei</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chris</namePart> <namePart type="family">Emezue</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Albert</namePart> <namePart type="family">Kahira</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shamsuddeen</namePart> <namePart type="family">Muhammad</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Akintunde</namePart> <namePart type="family">Oladipo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Abraham</namePart> <namePart type="family">Owodunni</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Atnafu</namePart> <namePart type="family">Tonja</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Iyanuoluwa</namePart> <namePart type="family">Shode</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Akari</namePart> <namePart type="family">Asai</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anuoluwapo</namePart> <namePart type="family">Aremu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ayodele</namePart> <namePart type="family">Awokoya</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bernard</namePart> <namePart type="family">Opoku</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chiamaka</namePart> <namePart type="family">Chukwuneke</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christine</namePart> <namePart type="family">Mwase</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Clemencia</namePart> <namePart type="family">Siro</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Stephen</namePart> <namePart type="family">Arthur</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tunde</namePart> <namePart type="family">Ajayi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Verrah</namePart> <namePart type="family">Otiende</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andre</namePart> <namePart type="family">Rubungo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Boyd</namePart> <namePart type="family">Sinkala</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Daniel</namePart> <namePart type="family">Ajisafe</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Emeka</namePart> <namePart type="family">Onwuegbuzia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Falalu</namePart> <namePart type="family">Lawan</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ibrahim</namePart> <namePart type="family">Ahmad</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jesujoba</namePart> <namePart type="family">Alabi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chinedu</namePart> <namePart type="family">Mbonu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mofetoluwa</namePart> <namePart type="family">Adeyemi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mofya</namePart> <namePart type="family">Phiri</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Orevaoghene</namePart> <namePart type="family">Ahia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ruqayya</namePart> <namePart type="family">Iro</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sonia</namePart> <namePart type="family">Adhiambo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2023-12</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Findings of the Association for Computational Linguistics: EMNLP 2023</title> </titleInfo> <name type="personal"> <namePart type="given">Houda</namePart> <namePart type="family">Bouamor</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Juan</namePart> <namePart type="family">Pino</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kalika</namePart> <namePart type="family">Bali</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Singapore</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.</abstract> <identifier type="citekey">ogundepo-etal-2023-cross</identifier> <identifier type="doi">10.18653/v1/2023.findings-emnlp.997</identifier> <location> <url>https://aclanthology.org/2023.findings-emnlp.997</url> </location> <part> <date>2023-12</date> <extent unit="page"> <start>14957</start> <end>14972</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Cross-lingual Open-Retrieval Question Answering for African Languages %A Ogundepo, Odunayo %A Gwadabe, Tajuddeen %A Rivera, Clara %A Clark, Jonathan %A Ruder, Sebastian %A Adelani, David %A Dossou, Bonaventure %A Diop, Abdou %A Sikasote, Claytone %A Hacheme, Gilles %A Buzaaba, Happy %A Ezeani, Ignatius %A Mabuya, Rooweither %A Osei, Salomey %A Emezue, Chris %A Kahira, Albert %A Muhammad, Shamsuddeen %A Oladipo, Akintunde %A Owodunni, Abraham %A Tonja, Atnafu %A Shode, Iyanuoluwa %A Asai, Akari %A Aremu, Anuoluwapo %A Awokoya, Ayodele %A Opoku, Bernard %A Chukwuneke, Chiamaka %A Mwase, Christine %A Siro, Clemencia %A Arthur, Stephen %A Ajayi, Tunde %A Otiende, Verrah %A Rubungo, Andre %A Sinkala, Boyd %A Ajisafe, Daniel %A Onwuegbuzia, Emeka %A Lawan, Falalu %A Ahmad, Ibrahim %A Alabi, Jesujoba %A Mbonu, Chinedu %A Adeyemi, Mofetoluwa %A Phiri, Mofya %A Ahia, Orevaoghene %A Iro, Ruqayya %A Adhiambo, Sonia %Y Bouamor, Houda %Y Pino, Juan %Y Bali, Kalika %S Findings of the Association for Computational Linguistics: EMNLP 2023 %D 2023 %8 December %I Association for Computational Linguistics %C Singapore %F ogundepo-etal-2023-cross %X African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology. %R 10.18653/v1/2023.findings-emnlp.997 %U https://aclanthology.org/2023.findings-emnlp.997 %U https://doi.org/10.18653/v1/2023.findings-emnlp.997 %P 14957-14972
Markdown (Informal)
[Cross-lingual Open-Retrieval Question Answering for African Languages](https://aclanthology.org/2023.findings-emnlp.997) (Ogundepo et al., Findings 2023)
- Cross-lingual Open-Retrieval Question Answering for African Languages (Ogundepo et al., Findings 2023)
ACL
- Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, Abdou Diop, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Kahira, Shamsuddeen Muhammad, Akintunde Oladipo, Abraham Owodunni, et al.. 2023. Cross-lingual Open-Retrieval Question Answering for African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14957–14972, Singapore. Association for Computational Linguistics.