2023
pdf
bib
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Atticus Harrigan
|
Aditi Chaudhary
|
Shruti Rijhwani
|
Sarah Moeller
|
Antti Arppe
|
Alexis Palmer
|
Ryan Henke
|
Daisy Rosenblum
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages
pdf
bib
abs
Finding words that aren’t there: Using word embeddings to improve dictionary search for low-resource languages
Antti Arppe
|
Andrew Neitsch
|
Daniel Dacanay
|
Jolene Poulin
|
Daniel Hieber
|
Atticus Harrigan
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Modern machine learning techniques have produced many impressive results in language technology, but these techniques generally require an amount of training data that is many orders of magnitude greater than what exists for low-resource languages in general, and endangered ones in particular. However, dictionary definitions in a comparatively much more well-resourced majority language can provide a link between low-resource languages and machine learning models trained on massive amounts of majority-language data. By leveraging a pre-trained English word embedding to compute sentence embeddings for definitions in bilingual dictionaries for four Indigenous languages spoken in North America, Plains Cree (nhiyawwin), Arapaho (Hinno’itit), Northern Haida (Xaad Kl), and Tsuut’ina (Tst’n), we have obtained promising results for dictionary search. Not only are the search results in the majority language of the definitions more relevant, but they can be semantically relevant in ways not achievable with classic information retrieval techniques: users can perform successful searches for words that do not occur at all in the dictionary. These techniques are directly applicable to any bilingual dictionary providing translations between a high- and low-resource language.
2022
pdf
bib
abs
Interactive Word Completion for Plains Cree
William Lane
|
Atticus Harrigan
|
Antti Arppe
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The composition of richly-inflected words in morphologically complex languages can be a challenge for language learners developing literacy. Accordingly, Lane and Bird (2020) proposed a finite state approach which maps prefixes in a language to a set of possible completions up to the next morpheme boundary, for the incremental building of complex words. In this work, we develop an approach to morph-based auto-completion based on a finite state morphological analyzer of Plains Cree (nêhiyawêwin), showing the portability of the concept to a much larger, more complete morphological transducer. Additionally, we propose and compare various novel ranking strategies on the morph auto-complete output. The best weighting scheme ranks the target completion in the top 10 results in 64.9% of queries, and in the top 50 in 73.9% of queries.
pdf
bib
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Sarah Moeller
|
Antonios Anastasopoulos
|
Antti Arppe
|
Aditi Chaudhary
|
Atticus Harrigan
|
Josh Holden
|
Jordan Lachler
|
Alexis Palmer
|
Shruti Rijhwani
|
Lane Schwartz
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
2021
pdf
bib
abs
Leveraging English Word Embeddings for Semi-Automatic Semantic Classification in Nêhiyawêwin (Plains Cree)
Atticus Harrigan
|
Antti Arppe
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
This paper details a semi-automatic method of word clustering for the Algonquian language, Nêhiyawêwin (Plains Cree). Although this method worked well, particularly for nouns, it required some amount of manual postprocessing. The main benefit of this approach over implementing an existing classification ontology is that this method approaches the language from an endogenous point of view, while performing classification quicker than in a fully manual context.
pdf
bib
abs
The More Detail, the Better? – Investigating the Effects of Semantic Ontology Specificity on Vector Semantic Classification with a Plains Cree / nêhiyawêwin Dictionary
Daniel Dacanay
|
Atticus Harrigan
|
Arok Wolvengrey
|
Antti Arppe
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
One problem in the task of automatic semantic classification is the problem of determining the level on which to group lexical items. This is often accomplished using pre-made, hierarchical semantic ontologies. The following investigation explores the computational assignment of semantic classifications on the contents of a dictionary of nêhiyawêwin / Plains Cree (ISO: crk, Algonquian, Western Canada and United States), using a semantic vector space model, and following two semantic ontologies, WordNet and SIL’s Rapid Words, and compares how these computational results compare to manual classifications with the same two ontologies.
pdf
bib
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Antti Arppe
|
Jeff Good
|
Atticus Harrigan
|
Mans Hulden
|
Jordan Lachler
|
Sarah Moeller
|
Alexis Palmer
|
Miikka Silfverberg
|
Lane Schwartz
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
pdf
bib
Computational Analysis versus Human Intuition: A Critical Comparison of Vector Semantics with Manual Semantic Classification in the Context of Plains Cree
Daniel Dacanay
|
Atticus Harrigan
|
Antti Arppe
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
2020
pdf
bib
abs
Design and evaluation of a smartphone keyboard for Plains Cree syllabics
Eddie Antonio Santos
|
Atticus Harrigan
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Plains Cree is a less-resourced language in Canada. To promote its usage online, we describe previous keyboard layouts for typing Plains Cree syllabics on smartphones. We describe our own solution whose development was guided by ergonomics research and corpus statistics. We then describe a case study in which three participants used a previous layout and our own, and we collected quantitative and qualitative data. We conclude that, despite observing accuracy improvements in user testing, introducing a brand new paradigm for typing Plains Cree syllabics may not be ideal for the community.
2019
pdf
bib
A Preliminary Plains Cree Speech Synthesizer
Atticus Harrigan
|
Antti Arppe
|
Timothy Mills
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)