Ernests Lavrinovics


2024

pdf bib
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Heather Lent | Kushal Tatariya | Raj Dabre | Yiyi Chen | Marcell Fekete | Esther Ploeger | Li Zhou | Ruth-Ann Armstrong | Abee Eijansantos | Catriona Malau | Hans Erik Heje | Ernests Lavrinovics | Diptesh Kanojia | Paul Belony | Marcel Bollmann | Loïc Grobol | Miryam de Lhoneux | Daniel Hershcovich | Michel DeGraff | Anders Søgaard | Johannes Bjerva
Transactions of the Association for Computational Linguistics, Volume 12

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of novel development datasets for reading comprehension relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, we see CreoleVal as an opportunity to empower research on Creoles in NLP and computational linguistics, and in general, a step towards more equitable language technology around the globe.

pdf bib
Leveraging Adapters for Improved Cross-lingual Transfer for Low-Resource Creole MT
Marcell Richard Fekete | Ernests Lavrinovics | Nathaniel Romney Robinson | Heather Lent | Raj Dabre | Johannes Bjerva
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)

———– EXTENDED ABSTRACT INTRODUCTION ———–Creole languages are low-resource languages, often genetically related to languages like English, French, and Portuguese, due to their linguistic histories with colonialism (DeGraff, 2003). As such, Creoles stand to benefit greatly from both data-efficient methods and transfer-learning from high-resource languages. At the same time, it has been observed by Lent et al. (2022b) that machine translation (MT) is a highly desired language technology by speakers of many Creoles. To this end, recent works have contributed new datasets, allowing for the development and evaluation of MT systems for Creoles (Robinson et al., 2024; Lent et al. 2024). In this work, we explore the use of the limited monolingual and parallel data for Creoles using parameter-efficient adaptation methods. Specifically, we compare the performance of different adapter architectures over the set of available benchmarks. We find adapters a promising approach for Creoles because they are parameter-efficient and have been shown to leverage transfer learning between related languages (Faisal and Anastasopoulos, 2022). While we perform experiments across multiple Creoles, we present only on Haitian Creole in this extended abstract. For future work, we aim to explore the potentials for leveraging other high-resourced languages for parameter-efficient transfer learning.