David Sides
2020
Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length
Jacqueline Brixey
|
David Sides
|
Timothy Vizthum
|
David Traum
|
Khalil Iskarous
Proceedings of the Twelfth Language Resources and Evaluation Conference
This work introduces additions to the corpus ChoCo, a multimodal corpus for the American indigenous language Choctaw. Using texts from the corpus, we develop new computational resources by using two off-the-shelf tools: word2vec and Linguistica. Our work illustrates how these tools can be successfully implemented with a small corpus.