JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.


Introduction
The extensive progress that has been made in NLP research in recent years has largely been constrained to around 20 of the 7000 languages spoken around the world (Magueresse et al., 2020).Creole languages, which emerge as a result of contact between speakers of different vernaculars, are even further underexplored (Lent et al., 2022b).
This work contributes to addressing this gap.We present JamPatoisNLI, the first natural language inference dataset in Jamaican Patois, which is an English-based creole spoken in the Caribbean.Additionally, to our knowledge, no other natural language inference corpus exists for any other creole

language.
Jamaican Patois is one of over 100 creole languages spoken by millions of inhabitants of different regions across the world, including Africa, the Caribbean, the Americas, islands in the Indian Ocean and the Pacific Ocean (including Australia and the Philippines) and South Asia (Romaine, 2017;Bakker and Daval-Markussen, 2013).Though there has been a recent spike in interest in work on low-resource languages in the NLP community (Kuriyozov et al., 2022;Kumar et al., 2022;Ebrahimi et al., 2021;Inuwa-Dutse, 2021;Hasan et al., 2020;Agić and Vulić, 2019;Chowdhury et al., 2018;Kumar et al., 2019;Das et al., 2017;Adewumi, 2022), creoles in particular are extremely under-explored in spite of the prevalence of their usage globally (Lent et al., 2022b).Working more with this class of languages is an important step in ensuring that the benefits of NLP technology are more equitably distributed globally.
Additionally, the class of creole languages is a uniquely interesting point of study within the space of multilingual NLP.Though creoles like Jamaican Patois have distinct morphosyntactic features, they often share significant lexical overlap with the high-resource base languages from which they are derived.This makes it possible to study cross-lingual transfer between high-resource and low-resource languages that are distinct, but share similar lexicons.In particular, JamPatoisNLI provides a benchmark for NLP researchers working to understand cross-lingual transfer to languages outside the training data of large pretrained multilingual models.Creole languages like Jamaican Patois have the unique property of being outside the pretraining data of these models, yet highly related to their base languages, which are present in the datasets used to train the models. 1amPatoisNLI was constructed using both naturally occurring and newly constructed utterances of Jamaican Patois rather than through translation.This mitigates the problem of skewed cross-lingual transfer results which arises when the test dataset consists of translated examples but the training dataset does not (Artetxe et al., 2020).This also enhances the ecological validity (de Vries et al., 2020) of the dataset, as it is grounded in real world usage of the language and is thus a more relevant, realistic benchmark.These two features mean that work done with the dataset will be particularly useful for moving towards developing technologies for speakers of the language.
We run studies on JamPatoisNLI transferring from monolingual English BERT, multilingual BERT, monolingual English RoBERTa and multilingual XLM-RoBERTa, finetuned on the Multi-NLI dataset, in zero-shot and fewshot settings.We find that monolingual English RoBERTa (76.50%) and multilingual XLM-RoBERTa (75.17%) achieve similar accuracies when we use the entire few-shot JamPatoisNLI training dataset with 250 examples for further finetuning.We also find that the monolingual English BERT model (66.17 %) and the multilingual BERT model (65.33 %), achieve similar accuracies when we use the entire few-shot JamPatoisNLI training dataset.In our experiments, the RoBERTabased models strongly outperform the BERT-based models.Additionally, we find that few-shot performance on JamPatoisNLI increases much faster (with respect to the number of few-shot training examples) than on languages in AmericasNLI, which have no strong connection to a high-resource language (Ebrahimi et al., 2021).Lastly, we run qualitative experiments which leverage the relatedness between Jamaican Patois and English to understand which differences between the languages boost or inhibit the effectiveness of cross-lingual transfer.
We hope that JamPatoisNLI prompts long-term research into building NLP tools that consider the particular difficulties and opportunities of NLP for Jamaican Patois and creole languages in general.

Related Work
Natural Language Inference Datasets.Natural language inference (NLI), or recognizing textual entailment, is a standard benchmark task for natural language understanding (Consortium et al., 1996;Dagan et al., 2005;Storks et al., 2019).
The input to the task is a pair of sentences: the premise and the hypothesis.The goal is to output a label -entailment, neutral or contradiction -to describe the relationship between the pair.Various approaches have been used to create NLI corpora.The Stanford NLI (SNLI) (Bowman et al., 2015), Multi-NLI (MNLI) (Williams et al., 2018) and Adversarial NLI (ANLI) (Williams et al., 2020) English datasets, esXNLI Spanish dataset (Artetxe et al., 2020) Original Chinese Natural Language Inference (OCNLI) dataset (Hu et al., 2020) and codemixed Hindi-English dataset (Khanuja et al., 2020) all consist of a mixture of pre-existing sentences and crowdsourced sentences.In the Japanese Realistic Textual Entailment Corpus, a collection of pre-existing sentences are filtered and paired using machine learning methods then manually annotated with labels (Yanaka and Mineshima, 2021).
Other NLI corpora have been made using translation techniques.The Natural Language Inference in Turkish (NLI-TR) dataset (Budur et al., 2020) was created using Amazon Translate on SNLI and MNLI.The Cross-Lingual NLI (XNLI) Corpus (Conneau et al., 2018) was created by collecting and crowd-sourcing 750 examples then hiring human translators to translate the sentences into 15 languages.Extensions of this dataset to low-resource languages such as AmericasNLI (Ebrahimi et al., 2021) and IndicXNLI (Aggarwal et al., 2022) have been created using human and machine translation methods.However, subsequent research has found that translation-based approaches to creating datasets can introduce subtle artifacts which can lead to skewed accuracies for cross-lingual transfer methods (Artetxe et al., 2020).JamPatoisNLI mitigates this problem by using original rather than translated examples.
In spite of the examples given above, generally, there is a relative dearth of datasets and research into methods for low-resource languages across NLI and other tasks.Low-resource languages can be defined as those which are 'less studied, resource scarce, less computerized, less privileged, less commonly taught or low density' (Magueresse et al., 2020).
Creole Languages in NLP.Creole languages are typically low-resource.These languages arise through the process of creolization of another class of languages called pidgins.Pidgins emerge as a result of contact between two or more groups of speakers which do not have a common language.A pidgin evolves to become a creole when it becomes the native language of the children of its speakers (Muysken et al., 1995).2Within the NLP community, a few datasets for different tasks have been created for creoles using a variety of methods.NaijaSenti is a Twitter humanannotated sentiment analysis dataset which is partly comprised of 14,000 tweets in Nigerian-Pidgin or Naija, which is an English-based creole (Muhammad et al., 2022).The authors find that codeswitching between these languages and English is a common feature in the dataset.They explore language adaptive finetuning and zero-shot cross lingual transfer from multilingual pretrained models, and achieve promising results.Cross-lingual Choice of Plausible Alternatives (XCOPA) (Ponti et al., 2020) is a multilingual dataset for causal common sense reasoning in 11 languages, one of which is Haitian Creole, that was created by translating English COPA.The authors find that across the languages in the dataset, translation based-approaches outperform methods which employ multilingual pretraining and finetuning.A part-of-speech tagging and dependency parsing corpus for Colloquial Singaporean English (Singlish), an English-based creole, has also been created (Wang et al., 2017) and further expanded (Wang et al., 2019) using the Universal Dependencies (Nivre et al., 2020) scheme.The dataset was created by crawling pages on online Singaporean forums.
Other work has also explored using machine learning methods for identifying and generating creole text.Chang et al. (2022) use contrastive learning to finetune BART (Lewis et al., 2019) so that the model produces novel dialogue texts in Naija and Yaounde (both English-based creoles).Soto (2020) uses a FastText (Joulin et al., 2016) based supervised classifier to identify instances of sentences in Guadeloupean Creole within a multilingual dataset.
The use of machine learning models on creole languages has also been investigated.Lent et al. (2021) find that standard language models work better than distributionally robust ones on creoles, which shows that these languages are relatively stable.Lent et al. (2022a) show that ancestor-tocreole transfer is non-trivial.
3 Jamaican Patois 3.1 Description of the Language Jamaican Patois (or Jamaican Creole) is an Englishbased creole spoken by over 3 million inhabitants on the island and by Jamaicans across the diaspora globally (Mair, 2003).Jamaican Patois resulted from contact between enslaved Africans brought to the island in the 17th century and British colonists.Because it is a hybrid of the languages spoken by the two groups of people that came in contact, it exists on a continuum that ranges from more dissimilar to less dissimilar to English (Davidson and Schwartz, 1995).The terms for the classes in the continuum are the acrolect (variations which are closest to English), the basilect (variations which are furthest from English) and the mesolect (variations which are in between) (Patrick, 2019) Examples of each are shown in Table 1.

Class Example
Basilect Me a nyam di bickle weh dem gi mi.Mesolect Me a eat di food weh dem gi mi.Acrolect I'm eating the food that they gave me.
Table 1: Different translations of 'I'm eating the food that they gave me' in Jamaican Patois.The basilectal extreme of the continuum consists of words that are nearly exclusively non-English.On the acrolectal extreme of the spectrum (or Jamaican Standard English), the example is identical to English.

Relevant Linguistic Features
Unstandardized Orthography.Jamaican Patois is primarily a spoken language.Though there have been efforts to develop a formal writing system for the language, none that have been developed are widely used by speakers of Patois.Instead, speakers use spelling patterns that reflect how words in Patois are pronounced.This is illustrated in Table 2.In the table, 'I want' is spelt both 'Me wah' and 'Mi waa': though the phrases yield similar pronunciations, different spellings are used.

Jamaican Patois English
Me wah bawl.
I want to cry.Mi waa cook.
I want to cook.Vocabulary Overlap with English.Since Jamaican Patois is English-based, there is a high degree of overlap between the vocabularies used by the two languages, in spite of differences in spelling, tense and structure.
We present an example of this in the quote below.Strictly non-English vocabulary (including words such as 'a' that have different meanings in English) which are highlighted in bold, account for less than one-third of the words in the sentence.
It look like more tourist start come since dem loosen up di restrictions dem.Mi frighten fi see how di beach full wen mi go a Negril weh day.
Therefore, JamPatoisNLI will be useful for evaluating the efficacy of methods for linguistic transfer in scenarios where there is a high degree of overlap between the source and target language.
Negation.Common markers of negation used in Jamaican Patois and their English equivalents which feature in the dataset are presented in Table 3. Examples of these markers in the dataset are presented in Table 17 in the Appendix.
Negation markers are important linguistic features in the context of NLI datasets, as their presence and interaction with other sentence components are highly relevant to the determination of the right classification for a given textual entailment example (Gururangan et al., 2018).

Constructing JamPatoisNLI
For each example in the dataset, we pulled the premise from a pre-existing text source.Then, a label was randomly selected and a corresponding hypothesis was written by the first author, who speaks and writes Jamaican Patois fluently.(Magueresse et al., 2020).However, for the purposes of our experiments, the sizes of the training, validation and testing sets are sufficient for exploring few-shot finetuning techniques and obtaining useful signals about the effectiveness of different methods.

Premise Collection
Since Jamaican Patois is primarily a spoken language, there is a limited number of textual sources of Patois that are readily available online.However, Patois speakers regularly use the language for communication on social media, and in literature.These are the sources that were used for the premises in the dataset.Around 97% of examples are drawn from Twitter and the remaining examples are drawn from a cultural website, jamaicans.com,and from literature by Jamaican poets, Dr. Louise Bennett-Coverley and Shelley Sykes-Coley.The number of examples per source is outlined in Table 13 in the Appendix.
This method of construction also makes the dataset less prone to effects from translation artifacts which can skew the effectiveness of different cross-lingual transfer techniques.Artetxe et al. (2020) find that when the test dataset is made using translated examples, there is a slight overestimation of the cross-lingual transfer gap as well as the efficacy of the TRANSLATE-TRAIN3 technique, and an underestimation of the efficacy of the TRANSLATE-TEST4 technique.None of these effects are present when the test dataset is composed of original examples which were not created through translation.Additionally, because the  premises of JamPatoisNLI are drawn from natural occurrences of Jamaican Patois written by various speakers of the language, the dataset better reflects the natural writing patterns of speakers than those created using machine or human translation techniques.

Hypothesis Construction
The set of hypotheses in the corpus is comprised of novel sentences constructed by our first author, who is a native speaker of Jamaican Patois.For each premise, a corresponding hypothesis was written so that the pair's classification would be either entailment, neutral or contradiction.The criteria used for assignment of pairs to each class is shown in Figure 4 in the Appendix.The constructed hypothesis in each example mimics the diverse spelling conventions and writing patterns used in the corresponding pre-existing premise.As such, the non-standardized nature of Jamaican Patois is reflected in both the collected and constructed sentences in the dataset.
In order to maximize the linguistic diversity of examples in the dataset, each premise was used to generate a single hypothesis (rather than three hypotheses generated per premise, which was done for MNLI (Williams et al., 2018)).

Label Validation
A random sample of 100 sentence pairs evenly distributed across the three classes was double annotated by fluent speakers of Jamaican Patois.We recruited volunteer annotators by reaching out to friends and colleagues.The labelling criteria given to the annotators were the same as those used to generate the hypotheses, and are outlined in Appendix Figure 4.In Table 6, we present statistics for inter-annotator agreement for these examples.The Fleiss Kappa accuracy for the dataset was 88.99% while the percentage accuracy was 89.00%.

Experiments and Results
Across our experiments, our goals are to: 1. Provide benchmarks for JamPatoisNLI thus determining the difficulty of the dataset and effectiveness of cross-lingual transfer.
2. Compare the effectiveness of cross-lingual transfer on JamPatoisNLI (a language that is related to language(s) present in the training corpus of each of the pretrained models we examine), to cross-lingual transfer on Amer-icasNLI (which contains languages that are unrelated to any language(s) present in the training corpus of each pretrained model).
3. Leverage the nature of Jamaican Patois as a creole to further understand cross-lingual transfer.
The experiments that we conduct are done in the zero-shot and few-shot settings.

General Setup
In our experiments, we use English BERT, multilingual BERT (Devlin et al., 2018), English RoBERTa (Liu et al., 2019) and XLM-RoBERTa (Conneau et al., 2019a) as our base pretrained models.We use a two-layer perceptron with ReLU activations for the classification head, and first finetune on the MNLI training dataset.We use cased and uncased versions of each BERT-based pretrained model, and experiment with frozen and unfrozen versions, 5 for a total of eight types of BERT-based models.For our RoBERTa-based models, we also experiment with frozen and unfrozen versions for a total of four types of RoBERTa-based models.Throughout our experiments with the twelve model types, we make comparisons among the BERT-based models and the RoBERTa-based models separately.
To select the twelve MNLI finetuned models that we use for our few-shot experiments, we conduct a hyperparameter search over dropouts in the range [0.2, 0.5], batch sizes in the range [8,32], learning rates in the range [1e-05, 1e-06] and epoch counts in the range [2, 10] and pick those that achieved reasonable accuracies on the MNLI development dataset (above 86% for unfrozen models and above 62% for frozen models).
Among the twelve selected models finetuned on MNLI, we evaluate the zero-shot and few-shot performance on each of our target datasets to determine which model types produce the highest accuracy.To compare the types of models, we fix the hyperparameters to the values in Table 16 in the Appendix, and average over three experiments with different seeds.Then, from among the eight finetuned BERT-based models, we pick the type that achieved the highest scores for the maximum number of few-shot training examples for each our validation datasets (JamPatoisNLI and Americas-5 In our frozen model, all parameters of the pretrained base models are fixed during finetuning so that only the NLI classification head is updated, while for our unfrozen models, all model parameters are allowed to update.NLI).We also do the same for the four finetuned RoBERTa-based models.

Hyperparameter
After we select the best out of the model types among the models finetuned on MNLI and further finetuned on the target fewshot datasets, we perform a final hyperparameter sweep.Tables 7 and  8 show the final set of hyperparameters that we arrived at after we conducted our sweep for the best models on the JamPatoisNLI and Americas-NLI validation sets among our BERT-based models and RoBERTa-based models.
In our few-shot finetuning setup, we select one example from each class for each "shot".For instance, using this convention, two-shot finetuning involves finetuning using six examples in total: two from each of the three NLI classes.Additionally, during few-shot finetuning, we keep all layers of the base model unfrozen.

Benchmarks for JamPatoisNLI
Setup.For JamPatoisNLI, the best BERT-based model type was the unfrozen uncased English BERT model (bert-uncased-unfrozen) based on accuracies on the validation set.
Using the hyperparameters in Table 7, we also make comparisons to a hypothesis only baseline (bert-uncased-unfrozen), as well as the best multilingual BERT-based model on JamPatois-NLI, which was the unfrozen uncased multilingual BERT model (mbert-uncased-unfrozen).
The best RoBERTa-based model type was the unfrozen English RoBERTa model (roberta-unfrozen).We also include results for the best multilingual RoBERTa-based model on the dataset, which was the unfrozen XLM-RoBERTa model (xlm-unfrozen).The hyperparameters that we used are listed in Table 8.
Results.Our results on the test set are presented in Table 9.
We found that with the maximum number training of examples, bert-uncased-unfrozen and mbert-uncased-unfrozen had relatively similar accuracies when all few-shot examples were used (66.17% and 65.33% respectively).We also found that roberta-unfrozen and xlm-unfrozen achieve similar accuracies on the full fewshot dataset (76.50% and 75.17%) respectively.
The two RoBERTa-based models significantly outperformed the two BERT-based models -in fact, the zero-shot accuracy on the roberta-unfrozen model (67.50%) outperforms both BERT based models when they are finetuned on the full fewshot dataset.
For our best model (xlm-unfrozen), the standard deviation in percentage accuracy for the maximum number of few-shot examples across ten experiments was 0.75% when evaluated on the validation set and 1.43% when evaluated on the test set.

Comparisons with AmericasNLI
Setup.A natural comparison point for JamPatois-NLI is AmericasNLI (Ebrahimi et al., 2021) as it is also a low-resource NLI dataset.However, unlike Jamaican Patois, the languages in the corpus are not closely related to any high-resource languages for which there are large pretrained language models or large natural language inference training datasets.In particular, the languages in Ameri-casNLI do not belong to the same family as any of the languages in the two most commonly used multilingual pretrained language models -multilingual BERT (Devlin et al., 2018) and XLM-R (Conneau et al., 2019b).JamPatoisNLI is unseen from the perspective of existing pretrained monolingual or multilingual models but related to the source language(s) involved in transfer learning, whereas AmericasNLI is both unseen and unrelated.
For our experiments, we use five of the languages in the AmericasNLI dataset, and create a randomly selected 250-200-200 train-dev-test split from among the examples in the original development dataset for each language (shown in Table 14 in the Appendix) to mirror the number of examples present in each of the splits in JamPatoisNLI.
For the AmericasNLI languages, the best BERTbased model type based on results on the validation set was the unfrozen cased multilingual BERT model (mbert-cased-unfrozen).The best RoBERTa-based model type was the unfrozen XLM-RoBERTa model (xlm-unfrozen).
Results.We present the results of our experiments on the test set in Table 10.We found that there was a significant gap in accuracies on JamPat-oisNLI and AmericasNLI.Across all experiments, both zero-shot and few-shot accuracies for the Jam-PatoisNLI dataset exceeded those for the Amer-icasNLI dataset.The best JamPatoisNLI model achieved a zero-shot accuracy of 67.50% while the best AmericasNLI model achieved a zero-shot (bert-uncased-unfrozen, roberta-unfrozen) and on the AmericasNLI dataset (mbert-cased-unfrozen, xlm-unfrozen).Experiments are averaged over three seeds and the best models were chosen based on results for the validation set.
Figure 2: Plots for the best AmericasNLI model (mbert-cased-unfrozen) on each language, and the best JamPatoisNLI model (bert-uncased-unfrozen).Experiments are averaged over three seeds and the best models were chosen based on results for the val.set.accuracy of 42.00% (both compared to a 33.50% majority baseline).
This shows that the language relatedness between Jamaican Patois and English significantly boosts the effectiveness of cross-lingual transfer learning even in the zero-shot case.For the fewshot setting, the highest accuracy achieved on the JamPatoisNLI dataset was 76.50%.The highest average accuracy achieved on the AmericasNLI dataset was 49.23%.
The plots comparing the best JamPatoisNLI model to the best AmericasNLI model on each of the respective datasets for BERT-based models and RoBERTa-based models are shown in Figures 2 and 3.For the BERT-based models, we see that cross-lingual transfer augmented by few-shot learning is quite effective for JamPatoisNLI, whereas the gains for AmericasNLI languages are rather modest.Tabulated results for these experiments can be found in Appendix Tables 18 and 19.

Experiments with Transitioning from
Jamaican Patois to English Setup.A key characteristic of Jamaican Patois is that it exists on a spectrum that ranges from highly dissimilar to English (the basilect), to highly similar to English (the acrolect).We experiment with 83-shot classification (the full set of examples in our few-shot training dataset) on an augmented test dataset derived from pairs that were incorrectly classified by at least two of the three models in our original few-shot experiments.To construct this dataset, we picked a single example for each type of misclassification with respect to the three NLI labels, for a total of 6 examples from the original dataset (which mostly fell on various points on the mesolectal range of the creole spectrum).We then wrote English translations for each of these examples (which would fall on the acrolectal end of the creole spectrum) and hand-wrote intermediate translations between them that are all valid Jamaican Patois to qualitatively study whether (and for what changes) along the path the label becomes correct.We conduct few-shot finetuning using our original training set for three models with different seeds using the parameters for the best BERT-based JamPatoisNLI model (bert-uncased-unfrozen), listed in Table 7.
Results.We present a qualitative example of this experiment in Table 11.Here, changing the verb from Jamaican Patois to English caused the models to switch to the correct classification.The three models switched to the correct prediction for a

Discussion
We see that the relatedness between Jamaican Patois and English strongly contributes to the effectiveness of cross-lingual transfer in both zero-shot and few-shot settings.Additionally, although natural language inference is a higher order reasoning task, our models achieved relatively high accuracy on the JamPatoisNLI dataset by learning the task from MNLI examples in English.
A natural question that arises based on these results, is whether vocabulary overlap is the primary factor that led to the boost in effectiveness of transfer learning in these experiments, or whether a higher order notion of similarity is a larger factor.Comparing zero-shot and few-shot accuracies for other languages that are closely related to English but do not share the same degree of vocabulary overlap as an English-based creole (such as German) might be an interesting line of future research.
Interestingly, though Jamaican Patois developed as a result of contact between speakers of English and speakers of West African languages (some of which are present in multilingual BERT's and XLM-RoBERTa's training corpus), the multilingual models were not more effective base pretrained language models than the monolingual models.Another possible direction for future research might be to determine whether there are methods that allow for more effective leveraging of the multilingual characteristic of the models during finetuning for creole target languages.

Conclusion
JamPatoisNLI is a natural language inference dataset in an English-based creole, constructed from existing and novel examples of Jamaican Patois.Our experiments show that the language's relatedness to English significantly boosts the effectiveness of cross-lingual transfer, even for the higher order task of natural language inference in both zero-shot and few-shot settings.We hope that the creation of this dataset encourages further research in the field on methods to improve crosslingual transfer for creole target languages, and the creation of other low-resource language and creole language datasets.

Limitations
One limitation of our research is related to the fact that Jamaican Patois is a low-resource language.The size of the dataset splits (particularly, the validation and test sets) are much smaller than those of high-resource language datasets.Further, the differences observed between the AmericasNLI and JamPatoisNLI datasets are not necessarily solely due to differences in language similarity to the source languages: another contributing factor might be differences in difficulty for the two datasets.

A.1 Finetuning with BitFit
BitFit is a sparse parameter efficient finetuning method introduced for use with small-to-medium sized training datasets which involves finetuning only the bias terms of a pretrained language model (Zaken et al., 2021).As an initial approach for fewshot finetuning, we experimented with using BitFit using the same hyperparameters described in our prior experiments (in Table 7) for the best JamPat-oisNLI model (English BERT uncased unfrozen), but increasing the learning rate by one order of magnitude as the authors do in the paper to 5e-04.
In Table 12, we present the results for few-shot finetuning using the BitFit method (Zaken et al., 2021) in comparison with the vanilla finetuning method (in which all model parameters are left unfrozen).In the zero-shot setting and in the cases where there are a small number of few-shot examples, the two techniques perform similarly, but BitFit begins to underperform relative to the vanilla method with more few-shot examples.Entailment.
(a) Given the premise, a reasonable reader would conclude that the hypothesis must also be true.
(b) The hypothesis is necessarily consistent with the premise.(c) If a speaker holds the sentiment or opinion expressed in premise, then a reasonable reader would conclude that they also hold the sentiment or opinion expressed in hypothesis.

Contradiction.
(a) Given the premise, a reasonable reader would conclude that the hypothesis must be false.(b) The hypothesis is necessarily inconsistent with the premise.(c) If a speaker holds the sentiment or opinion expressed in premise, then a reasonable reader would conclude that they do not hold the sentiment or opinion expressed in hypothesis.

Neutral
(a) Given the premise, a reasonable reader would conclude that the hypothesis could be either true or false.(b) The hypothesis is neither necessarily inconsistent nor necessarily consistent with the premise.(c) If a speaker holds the sentiment or opinion expressed in premise, then a reasonable reader would conclude that it may or may not be true that they hold the sentiment or opinion expressed in hypothesis.
Figure 4: Labelling criteria used to generate each hypothesis based on the premise, and given as labelling guidelines to dataset validators.

Figure 1 :
Figure 1: Linguistic features relevant for textual entailment classification for Jamaican Patois and lexical overlap with English.

Figure 3 :
Figure 3: Plots for the best AmericasNLI model (xlm-unfrozen) on each language, and the best Jam-PatoisNLI model (roberta-unfrozen).Experiments are averaged over three seeds and the best models were chosen based on results for the val.set.

Table 2 :
Example of varied spelling of Patois words present in the dataset.

Table 3 :
Markers of negation in Jamaican Patois.

Table 4 :
Random sample selected from the 100 double annotated examples in the corpus, with their gold labels and validation labels (abbreviated E, N, C) by each of the annotators.

Table 5 :
Statistics across the 650 examples in the dataset, by class and in aggregate.

Table 6 :
Inter-annotator agreement.We count a classification as accurate if both annotators agreed with the original annotations in the dataset.

Table 9 :
Zero-shot and few-shot accuracies for different models evaluated on JamPatoisNLI averaged over three experiments with different seeds.The best models were chosen based on results for the validation set.

Table 10 :
Test set accuracies for best BERTbased and RoBERTa-based models on the Jam-PatoisNLI dataset

Table 11 :
Sample from Jamaican Patois to English transition dataset.The final example is in English, and we present predictions made by three models finetuned with our Patois few-shot training dataset using the parameters for the best JamPatoisNLI model in Table7.changeprior to the full translation of the Jamaican Patois example to English for all but one of the originally misclassified examples in our experiments. .

Table 12 :
Comparison for zero-shot and few-shot finetuning using BitFit and the vanilla finetuning technique.Experiments are averaged over three seeds, and are reported on the test dataset.

Table 13 :
Sources for premises in the dataset.

Table 14 :
Languages used from the AmericasNLI dataset and the sizes of the original splits.

Table 15 :
Values used for few-shot hyperparameter sweep.Experiments are averaged over three seeds.

Table 16 :
Hyperparameters used for model type selection.Experiments are averaged over three seeds.