Interpreting Neural Networks with Nearest Neighbors

Eric Wallace; Shi Feng; Jordan Lee Boyd-Graber

doi:10.18653/v1/W18-5416

Interpreting Neural Networks with Nearest Neighbors

Eric Wallace, Shi Feng, Jordan Boyd-Graber

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.

Anthology ID:: W18-5416
Volume:: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Tal Linzen, Grzegorz Chrupała, Afra Alishahi
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 136–144
Language:
URL:: https://aclanthology.org/W18-5416/
DOI:: 10.18653/v1/W18-5416
Bibkey:
Cite (ACL):: Eric Wallace, Shi Feng, and Jordan Boyd-Graber. 2018. Interpreting Neural Networks with Nearest Neighbors. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 136–144, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Interpreting Neural Networks with Nearest Neighbors (Wallace et al., EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-5416.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{wallace-etal-2018-interpreting,
    title = "Interpreting Neural Networks with Nearest Neighbors",
    author = "Wallace, Eric  and
      Feng, Shi  and
      Boyd-Graber, Jordan",
    editor = "Linzen, Tal  and
      Chrupa{\l}a, Grzegorz  and
      Alishahi, Afra",
    booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}",
    month = nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-5416/",
    doi = "10.18653/v1/W18-5416",
    pages = "136--144",
    abstract = "Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="wallace-etal-2018-interpreting">
    <titleInfo>
        <title>Interpreting Neural Networks with Nearest Neighbors</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Eric</namePart>
        <namePart type="family">Wallace</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shi</namePart>
        <namePart type="family">Feng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jordan</namePart>
        <namePart type="family">Boyd-Graber</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Tal</namePart>
            <namePart type="family">Linzen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Grzegorz</namePart>
            <namePart type="family">Chrupała</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Afra</namePart>
            <namePart type="family">Alishahi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Brussels, Belgium</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.</abstract>
    <identifier type="citekey">wallace-etal-2018-interpreting</identifier>
    <identifier type="doi">10.18653/v1/W18-5416</identifier>
    <location>
        <url>https://aclanthology.org/W18-5416/</url>
    </location>
    <part>
        <date>2018-11</date>
        <extent unit="page">
            <start>136</start>
            <end>144</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Interpreting Neural Networks with Nearest Neighbors
%A Wallace, Eric
%A Feng, Shi
%A Boyd-Graber, Jordan
%Y Linzen, Tal
%Y Chrupała, Grzegorz
%Y Alishahi, Afra
%S Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
%D 2018
%8 November
%I Association for Computational Linguistics
%C Brussels, Belgium
%F wallace-etal-2018-interpreting
%X Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.
%R 10.18653/v1/W18-5416
%U https://aclanthology.org/W18-5416/
%U https://doi.org/10.18653/v1/W18-5416
%P 136-144

Download as File

Markdown (Informal)

[Interpreting Neural Networks with Nearest Neighbors](https://aclanthology.org/W18-5416/) (Wallace et al., EMNLP 2018)

Interpreting Neural Networks with Nearest Neighbors (Wallace et al., EMNLP 2018)

ACL

Eric Wallace, Shi Feng, and Jordan Boyd-Graber. 2018. Interpreting Neural Networks with Nearest Neighbors. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 136–144, Brussels, Belgium. Association for Computational Linguistics.