Why only Micro-F1? Class Weighting of Measures for Relation Classification

David Harbecke; Yuxuan Chen; Leonhard Hennig; Christoph Alt

doi:10.18653/v1/2022.nlppower-1.4

Why only Micro-F1? Class Weighting of Measures for Relation Classification

David Harbecke, Yuxuan Chen, Leonhard Hennig, Christoph Alt

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.

Anthology ID:: 2022.nlppower-1.4
Volume:: Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Tatiana Shavrina, Vladislav Mikhailov, Valentin Malykh, Ekaterina Artemova, Oleg Serikov, Vitaly Protasov
Venue:: nlppower
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32–41
Language:
URL:: https://aclanthology.org/2022.nlppower-1.4/
DOI:: 10.18653/v1/2022.nlppower-1.4
Bibkey:
Cite (ACL):: David Harbecke, Yuxuan Chen, Leonhard Hennig, and Christoph Alt. 2022. Why only Micro-F1? Class Weighting of Measures for Relation Classification. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pages 32–41, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Why only Micro-F1? Class Weighting of Measures for Relation Classification (Harbecke et al., nlppower 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.nlppower-1.4.pdf
Video:: https://aclanthology.org/2022.nlppower-1.4.mp4

PDF Cite Search Video Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{harbecke-etal-2022-micro,
    title = "Why only Micro-F1? Class Weighting of Measures for Relation Classification",
    author = "Harbecke, David  and
      Chen, Yuxuan  and
      Hennig, Leonhard  and
      Alt, Christoph",
    editor = "Shavrina, Tatiana  and
      Mikhailov, Vladislav  and
      Malykh, Valentin  and
      Artemova, Ekaterina  and
      Serikov, Oleg  and
      Protasov, Vitaly",
    booktitle = "Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.nlppower-1.4/",
    doi = "10.18653/v1/2022.nlppower-1.4",
    pages = "32--41",
    abstract = "Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="harbecke-etal-2022-micro">
    <titleInfo>
        <title>Why only Micro-F1? Class Weighting of Measures for Relation Classification</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">David</namePart>
        <namePart type="family">Harbecke</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yuxuan</namePart>
        <namePart type="family">Chen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Leonhard</namePart>
        <namePart type="family">Hennig</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Christoph</namePart>
        <namePart type="family">Alt</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Tatiana</namePart>
            <namePart type="family">Shavrina</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vladislav</namePart>
            <namePart type="family">Mikhailov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Valentin</namePart>
            <namePart type="family">Malykh</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ekaterina</namePart>
            <namePart type="family">Artemova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Oleg</namePart>
            <namePart type="family">Serikov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vitaly</namePart>
            <namePart type="family">Protasov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Dublin, Ireland</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.</abstract>
    <identifier type="citekey">harbecke-etal-2022-micro</identifier>
    <identifier type="doi">10.18653/v1/2022.nlppower-1.4</identifier>
    <location>
        <url>https://aclanthology.org/2022.nlppower-1.4/</url>
    </location>
    <part>
        <date>2022-05</date>
        <extent unit="page">
            <start>32</start>
            <end>41</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Why only Micro-F1? Class Weighting of Measures for Relation Classification
%A Harbecke, David
%A Chen, Yuxuan
%A Hennig, Leonhard
%A Alt, Christoph
%Y Shavrina, Tatiana
%Y Mikhailov, Vladislav
%Y Malykh, Valentin
%Y Artemova, Ekaterina
%Y Serikov, Oleg
%Y Protasov, Vitaly
%S Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland
%F harbecke-etal-2022-micro
%X Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.
%R 10.18653/v1/2022.nlppower-1.4
%U https://aclanthology.org/2022.nlppower-1.4/
%U https://doi.org/10.18653/v1/2022.nlppower-1.4
%P 32-41

Download as File

Markdown (Informal)

[Why only Micro-F1? Class Weighting of Measures for Relation Classification](https://aclanthology.org/2022.nlppower-1.4/) (Harbecke et al., nlppower 2022)

Why only Micro-F1? Class Weighting of Measures for Relation Classification (Harbecke et al., nlppower 2022)

ACL

David Harbecke, Yuxuan Chen, Leonhard Hennig, and Christoph Alt. 2022. Why only Micro-F1? Class Weighting of Measures for Relation Classification. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pages 32–41, Dublin, Ireland. Association for Computational Linguistics.