Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task

Marek Šuppa; Ondrej Jariabka

Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.

Anthology ID:: 2021.bsnlp-1.13
Volume:: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2021
Address:: Kiyv, Ukraine
Editors:: Bogdan Babych, Olga Kanishcheva, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Vasyl Starko, Josef Steinberger, Roman Yangarber, Michał Marcińczuk, Senja Pollak, Pavel Přibáň, Marko Robnik-Šikonja
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–114
Language:
URL:: https://aclanthology.org/2021.bsnlp-1.13/
DOI:
Bibkey:
Cite (ACL):: Marek Suppa and Ondrej Jariabka. 2021. Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 105–114, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task (Suppa & Jariabka, BSNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.bsnlp-1.13.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{suppa-jariabka-2021-benchmarking,
    title = "Benchmarking Pre-trained Language Models for Multilingual {NER}: {T}ra{S}pa{S} at the {BSNLP}2021 Shared Task",
    author = "Suppa, Marek  and
      Jariabka, Ondrej",
    editor = "Babych, Bogdan  and
      Kanishcheva, Olga  and
      Nakov, Preslav  and
      Piskorski, Jakub  and
      Pivovarova, Lidia  and
      Starko, Vasyl  and
      Steinberger, Josef  and
      Yangarber, Roman  and
      Marci{\'n}czuk, Micha{\l}  and
      Pollak, Senja  and
      P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
      Robnik-{\v{S}}ikonja, Marko",
    booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Kiyv, Ukraine",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bsnlp-1.13/",
    pages = "105--114",
    abstract = "In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at \url{https://github.com/NaiveNeuron/slavner-2021}."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="suppa-jariabka-2021-benchmarking">
    <titleInfo>
        <title>Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Marek</namePart>
        <namePart type="family">Suppa</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ondrej</namePart>
        <namePart type="family">Jariabka</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2021-04</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Bogdan</namePart>
            <namePart type="family">Babych</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Olga</namePart>
            <namePart type="family">Kanishcheva</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Preslav</namePart>
            <namePart type="family">Nakov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jakub</namePart>
            <namePart type="family">Piskorski</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lidia</namePart>
            <namePart type="family">Pivovarova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vasyl</namePart>
            <namePart type="family">Starko</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Josef</namePart>
            <namePart type="family">Steinberger</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Roman</namePart>
            <namePart type="family">Yangarber</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Michał</namePart>
            <namePart type="family">Marcińczuk</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Senja</namePart>
            <namePart type="family">Pollak</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Pavel</namePart>
            <namePart type="family">Přibáň</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marko</namePart>
            <namePart type="family">Robnik-Šikonja</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Kiyv, Ukraine</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.</abstract>
    <identifier type="citekey">suppa-jariabka-2021-benchmarking</identifier>
    <location>
        <url>https://aclanthology.org/2021.bsnlp-1.13/</url>
    </location>
    <part>
        <date>2021-04</date>
        <extent unit="page">
            <start>105</start>
            <end>114</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task
%A Suppa, Marek
%A Jariabka, Ondrej
%Y Babych, Bogdan
%Y Kanishcheva, Olga
%Y Nakov, Preslav
%Y Piskorski, Jakub
%Y Pivovarova, Lidia
%Y Starko, Vasyl
%Y Steinberger, Josef
%Y Yangarber, Roman
%Y Marcińczuk, Michał
%Y Pollak, Senja
%Y Přibáň, Pavel
%Y Robnik-Šikonja, Marko
%S Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
%D 2021
%8 April
%I Association for Computational Linguistics
%C Kiyv, Ukraine
%F suppa-jariabka-2021-benchmarking
%X In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.
%U https://aclanthology.org/2021.bsnlp-1.13/
%P 105-114

Download as File

Markdown (Informal)

[Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task](https://aclanthology.org/2021.bsnlp-1.13/) (Suppa & Jariabka, BSNLP 2021)

Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task (Suppa & Jariabka, BSNLP 2021)

ACL

Marek Suppa and Ondrej Jariabka. 2021. Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 105–114, Kiyv, Ukraine. Association for Computational Linguistics.