Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Kalvin Hartwig; Evan Lucas; Timothy Havens

doi:10.18653/v1/2023.americasnlp-1.8

Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus

Kalvin Hartwig, Evan Lucas, Timothy Havens

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].

Anthology ID:: 2023.americasnlp-1.8
Volume:: Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
Venue:: AmericasNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 58–66
Language:
URL:: https://aclanthology.org/2023.americasnlp-1.8/
DOI:: 10.18653/v1/2023.americasnlp-1.8
Bibkey:
Cite (ACL):: Kalvin Hartwig, Evan Lucas, and Timothy Havens. 2023. Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 58–66, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus (Hartwig et al., AmericasNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.americasnlp-1.8.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{hartwig-etal-2023-identification,
    title = "Identification of Dialect for Eastern and {S}outhwestern {O}jibwe Words Using a Small Corpus",
    author = "Hartwig, Kalvin  and
      Lucas, Evan  and
      Havens, Timothy",
    editor = "Mager, Manuel  and
      Ebrahimi, Abteen  and
      Oncevay, Arturo  and
      Rice, Enora  and
      Rijhwani, Shruti  and
      Palmer, Alexis  and
      Kann, Katharina",
    booktitle = "Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.americasnlp-1.8/",
    doi = "10.18653/v1/2023.americasnlp-1.8",
    pages = "58--66",
    abstract = "The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90{\%} across a five-fold cross validation and 72{\%} when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper]."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="hartwig-etal-2023-identification">
    <titleInfo>
        <title>Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Kalvin</namePart>
        <namePart type="family">Hartwig</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Evan</namePart>
        <namePart type="family">Lucas</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Timothy</namePart>
        <namePart type="family">Havens</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2023-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Manuel</namePart>
            <namePart type="family">Mager</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Abteen</namePart>
            <namePart type="family">Ebrahimi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Arturo</namePart>
            <namePart type="family">Oncevay</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Enora</namePart>
            <namePart type="family">Rice</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Shruti</namePart>
            <namePart type="family">Rijhwani</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alexis</namePart>
            <namePart type="family">Palmer</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Katharina</namePart>
            <namePart type="family">Kann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Toronto, Canada</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].</abstract>
    <identifier type="citekey">hartwig-etal-2023-identification</identifier>
    <identifier type="doi">10.18653/v1/2023.americasnlp-1.8</identifier>
    <location>
        <url>https://aclanthology.org/2023.americasnlp-1.8/</url>
    </location>
    <part>
        <date>2023-07</date>
        <extent unit="page">
            <start>58</start>
            <end>66</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus
%A Hartwig, Kalvin
%A Lucas, Evan
%A Havens, Timothy
%Y Mager, Manuel
%Y Ebrahimi, Abteen
%Y Oncevay, Arturo
%Y Rice, Enora
%Y Rijhwani, Shruti
%Y Palmer, Alexis
%Y Kann, Katharina
%S Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
%D 2023
%8 July
%I Association for Computational Linguistics
%C Toronto, Canada
%F hartwig-etal-2023-identification
%X The Ojibwe language has several dialects that vary to some degree in both spoken and written form. We present a method of using support vector machines to classify two different dialects (Eastern and Southwestern Ojibwe) using a very small corpus of text. Classification accuracy at the sentence level is 90% across a five-fold cross validation and 72% when the sentence-trained model is applied to a data set of individual words. Our code and the word level data set are released openly on Github at [link to be inserted for final version, working demonstration notebook uploaded with paper].
%R 10.18653/v1/2023.americasnlp-1.8
%U https://aclanthology.org/2023.americasnlp-1.8/
%U https://doi.org/10.18653/v1/2023.americasnlp-1.8
%P 58-66

Download as File

Markdown (Informal)

[Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus](https://aclanthology.org/2023.americasnlp-1.8/) (Hartwig et al., AmericasNLP 2023)

Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus (Hartwig et al., AmericasNLP 2023)

ACL

Kalvin Hartwig, Evan Lucas, and Timothy Havens. 2023. Identification of Dialect for Eastern and Southwestern Ojibwe Words Using a Small Corpus. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 58–66, Toronto, Canada. Association for Computational Linguistics.