Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Genta Indra Winata; Chien-Sheng Wu; Andrea Madotto; Pascale Fung

doi:10.18653/v1/W18-3214

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, Pascale Fung

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.

Anthology ID:: W18-3214
Volume:: Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Thamar Solorio, Mona Diab, Julia Hirschberg
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 110–114
Language:
URL:: https://aclanthology.org/W18-3214/
DOI:: 10.18653/v1/W18-3214
Bibkey:
Cite (ACL):: Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, and Pascale Fung. 2018. Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 110–114, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition (Winata et al., ACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-3214.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{winata-etal-2018-bilingual,
    title = "Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition",
    author = "Winata, Genta Indra  and
      Wu, Chien-Sheng  and
      Madotto, Andrea  and
      Fung, Pascale",
    editor = "Aguilar, Gustavo  and
      AlGhamdi, Fahad  and
      Soto, Victor  and
      Solorio, Thamar  and
      Diab, Mona  and
      Hirschberg, Julia",
    booktitle = "Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching",
    month = jul,
    year = "2018",
    address = "Melbourne, Australia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-3214/",
    doi = "10.18653/v1/W18-3214",
    pages = "110--114",
    abstract = "We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76{\%} harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="winata-etal-2018-bilingual">
    <titleInfo>
        <title>Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Genta</namePart>
        <namePart type="given">Indra</namePart>
        <namePart type="family">Winata</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chien-Sheng</namePart>
        <namePart type="family">Wu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Andrea</namePart>
        <namePart type="family">Madotto</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Pascale</namePart>
        <namePart type="family">Fung</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Gustavo</namePart>
            <namePart type="family">Aguilar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Fahad</namePart>
            <namePart type="family">AlGhamdi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Victor</namePart>
            <namePart type="family">Soto</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Thamar</namePart>
            <namePart type="family">Solorio</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mona</namePart>
            <namePart type="family">Diab</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Julia</namePart>
            <namePart type="family">Hirschberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Melbourne, Australia</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.</abstract>
    <identifier type="citekey">winata-etal-2018-bilingual</identifier>
    <identifier type="doi">10.18653/v1/W18-3214</identifier>
    <location>
        <url>https://aclanthology.org/W18-3214/</url>
    </location>
    <part>
        <date>2018-07</date>
        <extent unit="page">
            <start>110</start>
            <end>114</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition
%A Winata, Genta Indra
%A Wu, Chien-Sheng
%A Madotto, Andrea
%A Fung, Pascale
%Y Aguilar, Gustavo
%Y AlGhamdi, Fahad
%Y Soto, Victor
%Y Solorio, Thamar
%Y Diab, Mona
%Y Hirschberg, Julia
%S Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
%D 2018
%8 July
%I Association for Computational Linguistics
%C Melbourne, Australia
%F winata-etal-2018-bilingual
%X We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.
%R 10.18653/v1/W18-3214
%U https://aclanthology.org/W18-3214/
%U https://doi.org/10.18653/v1/W18-3214
%P 110-114

Download as File

Markdown (Informal)

[Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition](https://aclanthology.org/W18-3214/) (Winata et al., ACL 2018)

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition (Winata et al., ACL 2018)

ACL

Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, and Pascale Fung. 2018. Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 110–114, Melbourne, Australia. Association for Computational Linguistics.