Comparison of Representations of Named Entities for Document Classification

Lidia Pivovarova; Roman Yangarber

doi:10.18653/v1/W18-3008

Comparison of Representations of Named Entities for Document Classification

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.

Anthology ID:: W18-3008
Volume:: Proceedings of the Third Workshop on Representation Learning for NLP
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Isabelle Augenstein, Kris Cao, He He, Felix Hill, Spandana Gella, Jamie Kiros, Hongyuan Mei, Dipendra Misra
Venue:: RepL4NLP
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 64–68
Language:
URL:: https://aclanthology.org/W18-3008/
DOI:: 10.18653/v1/W18-3008
Bibkey:
Cite (ACL):: Lidia Pivovarova and Roman Yangarber. 2018. Comparison of Representations of Named Entities for Document Classification. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 64–68, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: Comparison of Representations of Named Entities for Document Classification (Pivovarova & Yangarber, RepL4NLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-3008.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{pivovarova-yangarber-2018-comparison,
    title = "Comparison of Representations of Named Entities for Document Classification",
    author = "Pivovarova, Lidia  and
      Yangarber, Roman",
    editor = "Augenstein, Isabelle  and
      Cao, Kris  and
      He, He  and
      Hill, Felix  and
      Gella, Spandana  and
      Kiros, Jamie  and
      Mei, Hongyuan  and
      Misra, Dipendra",
    booktitle = "Proceedings of the Third Workshop on Representation Learning for {NLP}",
    month = jul,
    year = "2018",
    address = "Melbourne, Australia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-3008/",
    doi = "10.18653/v1/W18-3008",
    pages = "64--68",
    abstract = "We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="pivovarova-yangarber-2018-comparison">
    <titleInfo>
        <title>Comparison of Representations of Named Entities for Document Classification</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Lidia</namePart>
        <namePart type="family">Pivovarova</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Roman</namePart>
        <namePart type="family">Yangarber</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Representation Learning for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Isabelle</namePart>
            <namePart type="family">Augenstein</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kris</namePart>
            <namePart type="family">Cao</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">He</namePart>
            <namePart type="family">He</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Felix</namePart>
            <namePart type="family">Hill</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Spandana</namePart>
            <namePart type="family">Gella</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jamie</namePart>
            <namePart type="family">Kiros</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Hongyuan</namePart>
            <namePart type="family">Mei</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Dipendra</namePart>
            <namePart type="family">Misra</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Melbourne, Australia</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.</abstract>
    <identifier type="citekey">pivovarova-yangarber-2018-comparison</identifier>
    <identifier type="doi">10.18653/v1/W18-3008</identifier>
    <location>
        <url>https://aclanthology.org/W18-3008/</url>
    </location>
    <part>
        <date>2018-07</date>
        <extent unit="page">
            <start>64</start>
            <end>68</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Comparison of Representations of Named Entities for Document Classification
%A Pivovarova, Lidia
%A Yangarber, Roman
%Y Augenstein, Isabelle
%Y Cao, Kris
%Y He, He
%Y Hill, Felix
%Y Gella, Spandana
%Y Kiros, Jamie
%Y Mei, Hongyuan
%Y Misra, Dipendra
%S Proceedings of the Third Workshop on Representation Learning for NLP
%D 2018
%8 July
%I Association for Computational Linguistics
%C Melbourne, Australia
%F pivovarova-yangarber-2018-comparison
%X We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.
%R 10.18653/v1/W18-3008
%U https://aclanthology.org/W18-3008/
%U https://doi.org/10.18653/v1/W18-3008
%P 64-68

Download as File

Markdown (Informal)

[Comparison of Representations of Named Entities for Document Classification](https://aclanthology.org/W18-3008/) (Pivovarova & Yangarber, RepL4NLP 2018)

Comparison of Representations of Named Entities for Document Classification (Pivovarova & Yangarber, RepL4NLP 2018)

ACL

Lidia Pivovarova and Roman Yangarber. 2018. Comparison of Representations of Named Entities for Document Classification. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 64–68, Melbourne, Australia. Association for Computational Linguistics.