BBPOS: BERT-based Part-of-Speech Tagging for Uzbek

Latofat Bobojonova; Arofat Akhundjanova; Phil Sidney Ostheimer; Sophie Fellenz

BBPOS: BERT-based Part-of-Speech Tagging for Uzbek

Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer, Sophie Fellenz

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.

Anthology ID:: 2025.loreslm-1.23
Volume:: Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:: January
Year:: 2025
Address:: Abu Dhabi, United Arab Emirates
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:: LoResLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 287–293
Language:
URL:: https://aclanthology.org/2025.loreslm-1.23/
DOI:
Bibkey:
Cite (ACL):: Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer, and Sophie Fellenz. 2025. BBPOS: BERT-based Part-of-Speech Tagging for Uzbek. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 287–293, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: BBPOS: BERT-based Part-of-Speech Tagging for Uzbek (Bobojonova et al., LoResLM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.loreslm-1.23.pdf
Optionalsupplementarymaterial:: 2025.loreslm-1.23.OptionalSupplementaryMaterial.zip

PDF Cite Search Optionalsupplementarymaterial Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{bobojonova-etal-2025-bbpos,
    title = "{BBPOS}: {BERT}-based Part-of-Speech Tagging for {U}zbek",
    author = "Bobojonova, Latofat  and
      Akhundjanova, Arofat  and
      Ostheimer, Phil Sidney  and
      Fellenz, Sophie",
    editor = "Hettiarachchi, Hansi  and
      Ranasinghe, Tharindu  and
      Rayson, Paul  and
      Mitkov, Ruslan  and
      Gaber, Mohamed  and
      Premasiri, Damith  and
      Tan, Fiona Anting  and
      Uyangodage, Lasitha",
    booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.loreslm-1.23/",
    pages = "287--293",
    abstract = "This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91{\%} average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="bobojonova-etal-2025-bbpos">
    <titleInfo>
        <title>BBPOS: BERT-based Part-of-Speech Tagging for Uzbek</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Latofat</namePart>
        <namePart type="family">Bobojonova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Arofat</namePart>
        <namePart type="family">Akhundjanova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Phil</namePart>
        <namePart type="given">Sidney</namePart>
        <namePart type="family">Ostheimer</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sophie</namePart>
        <namePart type="family">Fellenz</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-01</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the First Workshop on Language Models for Low-Resource Languages</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Hansi</namePart>
            <namePart type="family">Hettiarachchi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Tharindu</namePart>
            <namePart type="family">Ranasinghe</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Paul</namePart>
            <namePart type="family">Rayson</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ruslan</namePart>
            <namePart type="family">Mitkov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mohamed</namePart>
            <namePart type="family">Gaber</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Damith</namePart>
            <namePart type="family">Premasiri</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Fiona</namePart>
            <namePart type="given">Anting</namePart>
            <namePart type="family">Tan</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lasitha</namePart>
            <namePart type="family">Uyangodage</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Abu Dhabi, United Arab Emirates</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.</abstract>
    <identifier type="citekey">bobojonova-etal-2025-bbpos</identifier>
    <location>
        <url>https://aclanthology.org/2025.loreslm-1.23/</url>
    </location>
    <part>
        <date>2025-01</date>
        <extent unit="page">
            <start>287</start>
            <end>293</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
%A Bobojonova, Latofat
%A Akhundjanova, Arofat
%A Ostheimer, Phil Sidney
%A Fellenz, Sophie
%Y Hettiarachchi, Hansi
%Y Ranasinghe, Tharindu
%Y Rayson, Paul
%Y Mitkov, Ruslan
%Y Gaber, Mohamed
%Y Premasiri, Damith
%Y Tan, Fiona Anting
%Y Uyangodage, Lasitha
%S Proceedings of the First Workshop on Language Models for Low-Resource Languages
%D 2025
%8 January
%I Association for Computational Linguistics
%C Abu Dhabi, United Arab Emirates
%F bobojonova-etal-2025-bbpos
%X This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.
%U https://aclanthology.org/2025.loreslm-1.23/
%P 287-293

Download as File

Markdown (Informal)

[BBPOS: BERT-based Part-of-Speech Tagging for Uzbek](https://aclanthology.org/2025.loreslm-1.23/) (Bobojonova et al., LoResLM 2025)

BBPOS: BERT-based Part-of-Speech Tagging for Uzbek (Bobojonova et al., LoResLM 2025)

ACL

Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer, and Sophie Fellenz. 2025. BBPOS: BERT-based Part-of-Speech Tagging for Uzbek. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 287–293, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.