Nat Gillin
2024
One-Shot Prompt for Language Variety Identification
Nat Gillin
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
We present a one-shot prompting approach to multi-class classification for similar language identification with off-the-shelf pre-trained large language model that is not particularly trained or tuned for the language identification task. Without post-training or fine-tuning the model, we simply include one example per class when prompting the model and surprisingly the model to generate the language andlocale labels accordingly.
2023
Few-shot Spanish-Aymara Machine Translation Using English-Aymara Lexicon
Nat Gillin
|
Brian Gummibaerhausen
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
This paper presents the experiments to train a Spanish-Aymara machine translation model for the AmericasNLP 2023 Machine Translation shared task. We included the English-Aymara GlobalVoices corpus and an English-Aymara lexicon to train the model and limit our training resources to train the model in a \textit{few-shot} manner.
2022
Is Encoder-Decoder Transformer the Shiny Hammer?
Nat Gillin
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
We present an approach to multi-class classification using an encoder-decoder transformer model. We trained a network to identify French varieties using the same scripts we use to train an encoder-decoder machine translation model. With some slight modification to the data preparation and inference parameters, we showed that the same tools used for machine translation can be easily re-used to achieve competitive performance for classification. On the French Dialectal Identification (FDI) task, we scored 32.4 on weighted F1, but this is far from a simple naive bayes classifier that outperforms a neural encoder-decoder model at 41.27 weighted F1.
Search