Gbemileke Onilude


2024

pdf bib
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Ahmet Üstün | Viraat Aryabumi | Zheng Yong | Wei-Yin Ko | Daniel D’souza | Gbemileke Onilude | Neel Bhandari | Shivalika Singh | Hui-Lee Ooi | Amr Kayid | Freddie Vargus | Phil Blunsom | Shayne Longpre | Niklas Muennighoff | Marzieh Fadaee | Julia Kreutzer | Sara Hooker
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages —— including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models.

2022

pdf bib
Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji | Orevaoghene Ahia | Gbemileke Onilude | Sebastian Gehrmann | Sara Hooker | Julia Kreutzer
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning.Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.