GEval: Tool for Debugging NLP Datasets and Models

Filip Gralinski; Anna Wróblewska; Tomasz Stanisławek; Kamil Grabowski; Tomasz Górecki

doi:10.18653/v1/W19-4826

GEval: Tool for Debugging NLP Datasets and Models

Filip Graliński, Anna Wróblewska, Tomasz Stanisławek, Kamil Grabowski, Tomasz Górecki

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper presents a simple but general and effective method to debug the output of machine learning (ML) supervised models, including neural networks. The algorithm looks for features that lower the evaluation metric in such a way that it cannot be ascribed to chance (as measured by their p-values). Using this method – implemented as MLEval tool – you can find: (1) anomalies in test sets, (2) issues in preprocessing, (3) problems in the ML model itself. It can give you an insight into what can be improved in the datasets and/or the model. The same method can be used to compare ML models or different versions of the same model. We present the tool, the theory behind it and use cases for text-based models of various types.

Anthology ID:: W19-4826
Volume:: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Tal Linzen, Grzegorz Chrupała, Yonatan Belinkov, Dieuwke Hupkes
Venue:: BlackboxNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 254–262
Language:
URL:: https://aclanthology.org/W19-4826/
DOI:: 10.18653/v1/W19-4826
Bibkey:
Cite (ACL):: Filip Graliński, Anna Wróblewska, Tomasz Stanisławek, Kamil Grabowski, and Tomasz Górecki. 2019. GEval: Tool for Debugging NLP Datasets and Models. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 254–262, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: GEval: Tool for Debugging NLP Datasets and Models (Graliński et al., BlackboxNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-4826.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{gralinski-etal-2019-geval,
    title = "{GE}val: Tool for Debugging {NLP} Datasets and Models",
    author = "Grali{\'n}ski, Filip  and
      Wr{\'o}blewska, Anna  and
      Stanis{\l}awek, Tomasz  and
      Grabowski, Kamil  and
      G{\'o}recki, Tomasz",
    editor = "Linzen, Tal  and
      Chrupa{\l}a, Grzegorz  and
      Belinkov, Yonatan  and
      Hupkes, Dieuwke",
    booktitle = "Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-4826/",
    doi = "10.18653/v1/W19-4826",
    pages = "254--262",
    abstract = "This paper presents a simple but general and effective method to debug the output of machine learning (ML) supervised models, including neural networks. The algorithm looks for features that lower the evaluation metric in such a way that it cannot be ascribed to chance (as measured by their p-values). Using this method {--} implemented as MLEval tool {--} you can find: (1) anomalies in test sets, (2) issues in preprocessing, (3) problems in the ML model itself. It can give you an insight into what can be improved in the datasets and/or the model. The same method can be used to compare ML models or different versions of the same model. We present the tool, the theory behind it and use cases for text-based models of various types."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="gralinski-etal-2019-geval">
    <titleInfo>
        <title>GEval: Tool for Debugging NLP Datasets and Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Filip</namePart>
        <namePart type="family">Graliński</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Anna</namePart>
        <namePart type="family">Wróblewska</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tomasz</namePart>
        <namePart type="family">Stanisławek</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kamil</namePart>
        <namePart type="family">Grabowski</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tomasz</namePart>
        <namePart type="family">Górecki</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Tal</namePart>
            <namePart type="family">Linzen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Grzegorz</namePart>
            <namePart type="family">Chrupała</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yonatan</namePart>
            <namePart type="family">Belinkov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Dieuwke</namePart>
            <namePart type="family">Hupkes</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper presents a simple but general and effective method to debug the output of machine learning (ML) supervised models, including neural networks. The algorithm looks for features that lower the evaluation metric in such a way that it cannot be ascribed to chance (as measured by their p-values). Using this method – implemented as MLEval tool – you can find: (1) anomalies in test sets, (2) issues in preprocessing, (3) problems in the ML model itself. It can give you an insight into what can be improved in the datasets and/or the model. The same method can be used to compare ML models or different versions of the same model. We present the tool, the theory behind it and use cases for text-based models of various types.</abstract>
    <identifier type="citekey">gralinski-etal-2019-geval</identifier>
    <identifier type="doi">10.18653/v1/W19-4826</identifier>
    <location>
        <url>https://aclanthology.org/W19-4826/</url>
    </location>
    <part>
        <date>2019-08</date>
        <extent unit="page">
            <start>254</start>
            <end>262</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T GEval: Tool for Debugging NLP Datasets and Models
%A Graliński, Filip
%A Wróblewska, Anna
%A Stanisławek, Tomasz
%A Grabowski, Kamil
%A Górecki, Tomasz
%Y Linzen, Tal
%Y Chrupała, Grzegorz
%Y Belinkov, Yonatan
%Y Hupkes, Dieuwke
%S Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
%D 2019
%8 August
%I Association for Computational Linguistics
%C Florence, Italy
%F gralinski-etal-2019-geval
%X This paper presents a simple but general and effective method to debug the output of machine learning (ML) supervised models, including neural networks. The algorithm looks for features that lower the evaluation metric in such a way that it cannot be ascribed to chance (as measured by their p-values). Using this method – implemented as MLEval tool – you can find: (1) anomalies in test sets, (2) issues in preprocessing, (3) problems in the ML model itself. It can give you an insight into what can be improved in the datasets and/or the model. The same method can be used to compare ML models or different versions of the same model. We present the tool, the theory behind it and use cases for text-based models of various types.
%R 10.18653/v1/W19-4826
%U https://aclanthology.org/W19-4826/
%U https://doi.org/10.18653/v1/W19-4826
%P 254-262

Download as File

Markdown (Informal)

[GEval: Tool for Debugging NLP Datasets and Models](https://aclanthology.org/W19-4826/) (Graliński et al., BlackboxNLP 2019)

GEval: Tool for Debugging NLP Datasets and Models (Graliński et al., BlackboxNLP 2019)

ACL

Filip Graliński, Anna Wróblewska, Tomasz Stanisławek, Kamil Grabowski, and Tomasz Górecki. 2019. GEval: Tool for Debugging NLP Datasets and Models. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 254–262, Florence, Italy. Association for Computational Linguistics.