Ruth-Ann Armstrong
2024
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Heather Lent
|
Kushal Tatariya
|
Raj Dabre
|
Yiyi Chen
|
Marcell Fekete
|
Esther Ploeger
|
Li Zhou
|
Ruth-Ann Armstrong
|
Abee Eijansantos
|
Catriona Malau
|
Hans Erik Heje
|
Ernests Lavrinovics
|
Diptesh Kanojia
|
Paul Belony
|
Marcel Bollmann
|
Loïc Grobol
|
Miryam de Lhoneux
|
Daniel Hershcovich
|
Michel DeGraff
|
Anders Søgaard
|
Johannes Bjerva
Transactions of the Association for Computational Linguistics, Volume 12
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of novel development datasets for reading comprehension relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, we see CreoleVal as an opportunity to empower research on Creoles in NLP and computational linguistics, and in general, a step towards more equitable language technology around the globe.
2022
JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset
Ruth-Ann Armstrong
|
John Hewitt
|
Christopher Manning
Findings of the Association for Computational Linguistics: EMNLP 2022
JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois.Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.
Search
Co-authors
- Heather Lent 1
- Kushal Tatariya 1
- Raj Dabre 1
- Yiyi Chen 1
- Marcell Fekete 1
- show all...