Label Errors in BANKING77

Cecilia Ying; Stephen Thomas

doi:10.18653/v1/2022.insights-1.19

Label Errors in BANKING77

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.

Anthology ID:: 2022.insights-1.19
Volume:: Proceedings of the Third Workshop on Insights from Negative Results in NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
Venue:: insights
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 139–143
Language:
URL:: https://aclanthology.org/2022.insights-1.19/
DOI:: 10.18653/v1/2022.insights-1.19
Bibkey:
Cite (ACL):: Cecilia Ying and Stephen Thomas. 2022. Label Errors in BANKING77. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 139–143, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Label Errors in BANKING77 (Ying & Thomas, insights 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.insights-1.19.pdf
Video:: https://aclanthology.org/2022.insights-1.19.mp4

PDF Cite Search Video Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ying-thomas-2022-label,
    title = "Label Errors in {BANKING}77",
    author = "Ying, Cecilia  and
      Thomas, Stephen",
    editor = "Tafreshi, Shabnam  and
      Sedoc, Jo{\~a}o  and
      Rogers, Anna  and
      Drozd, Aleksandr  and
      Rumshisky, Anna  and
      Akula, Arjun",
    booktitle = "Proceedings of the Third Workshop on Insights from Negative Results in NLP",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.insights-1.19/",
    doi = "10.18653/v1/2022.insights-1.19",
    pages = "139--143",
    abstract = "We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14{\%}) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5{\%} and 8{\%} for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ying-thomas-2022-label">
    <titleInfo>
        <title>Label Errors in BANKING77</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Cecilia</namePart>
        <namePart type="family">Ying</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Stephen</namePart>
        <namePart type="family">Thomas</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Insights from Negative Results in NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Shabnam</namePart>
            <namePart type="family">Tafreshi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">João</namePart>
            <namePart type="family">Sedoc</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Rogers</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Aleksandr</namePart>
            <namePart type="family">Drozd</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Rumshisky</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Arjun</namePart>
            <namePart type="family">Akula</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Dublin, Ireland</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.</abstract>
    <identifier type="citekey">ying-thomas-2022-label</identifier>
    <identifier type="doi">10.18653/v1/2022.insights-1.19</identifier>
    <location>
        <url>https://aclanthology.org/2022.insights-1.19/</url>
    </location>
    <part>
        <date>2022-05</date>
        <extent unit="page">
            <start>139</start>
            <end>143</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Label Errors in BANKING77
%A Ying, Cecilia
%A Thomas, Stephen
%Y Tafreshi, Shabnam
%Y Sedoc, João
%Y Rogers, Anna
%Y Drozd, Aleksandr
%Y Rumshisky, Anna
%Y Akula, Arjun
%S Proceedings of the Third Workshop on Insights from Negative Results in NLP
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland
%F ying-thomas-2022-label
%X We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.
%R 10.18653/v1/2022.insights-1.19
%U https://aclanthology.org/2022.insights-1.19/
%U https://doi.org/10.18653/v1/2022.insights-1.19
%P 139-143

Download as File

Markdown (Informal)

[Label Errors in BANKING77](https://aclanthology.org/2022.insights-1.19/) (Ying & Thomas, insights 2022)

Label Errors in BANKING77 (Ying & Thomas, insights 2022)

ACL

Cecilia Ying and Stephen Thomas. 2022. Label Errors in BANKING77. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 139–143, Dublin, Ireland. Association for Computational Linguistics.