An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

Elena Álvarez-Mellado

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.

Anthology ID:: 2020.calcs-1.1
Volume:: Proceedings of the 4th Workshop on Computational Approaches to Code Switching
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Thamar Solorio, Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Amitava Das, Mona Diab
Venue:: CALCS
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1–8
Language:: English
URL:: https://aclanthology.org/2020.calcs-1.1/
DOI:
Bibkey:
Cite (ACL):: Elena Alvarez-Mellado. 2020. An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching, pages 1–8, Marseille, France. European Language Resources Association.
Cite (Informal):: An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines (Alvarez-Mellado, CALCS 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.calcs-1.1.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{alvarez-mellado-2020-annotated,
    title = "An Annotated Corpus of Emerging Anglicisms in {S}panish Newspaper Headlines",
    author = "Alvarez-Mellado, Elena",
    editor = "Solorio, Thamar  and
      Choudhury, Monojit  and
      Bali, Kalika  and
      Sitaram, Sunayana  and
      Das, Amitava  and
      Diab, Mona",
    booktitle = "Proceedings of the 4th Workshop on Computational Approaches to Code Switching",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2020.calcs-1.1/",
    pages = "1--8",
    language = "eng",
    ISBN = "979-10-95546-66-5",
    abstract = "The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="alvarez-mellado-2020-annotated">
    <titleInfo>
        <title>An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Elena</namePart>
        <namePart type="family">Alvarez-Mellado</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <language>
        <languageTerm type="text">eng</languageTerm>
    </language>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 4th Workshop on Computational Approaches to Code Switching</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Thamar</namePart>
            <namePart type="family">Solorio</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Monojit</namePart>
            <namePart type="family">Choudhury</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kalika</namePart>
            <namePart type="family">Bali</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sunayana</namePart>
            <namePart type="family">Sitaram</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Amitava</namePart>
            <namePart type="family">Das</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mona</namePart>
            <namePart type="family">Diab</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>European Language Resources Association</publisher>
            <place>
                <placeTerm type="text">Marseille, France</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-10-95546-66-5</identifier>
    </relatedItem>
    <abstract>The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.</abstract>
    <identifier type="citekey">alvarez-mellado-2020-annotated</identifier>
    <location>
        <url>https://aclanthology.org/2020.calcs-1.1/</url>
    </location>
    <part>
        <date>2020-05</date>
        <extent unit="page">
            <start>1</start>
            <end>8</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines
%A Alvarez-Mellado, Elena
%Y Solorio, Thamar
%Y Choudhury, Monojit
%Y Bali, Kalika
%Y Sitaram, Sunayana
%Y Das, Amitava
%Y Diab, Mona
%S Proceedings of the 4th Workshop on Computational Approaches to Code Switching
%D 2020
%8 May
%I European Language Resources Association
%C Marseille, France
%@ 979-10-95546-66-5
%G eng
%F alvarez-mellado-2020-annotated
%X The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.
%U https://aclanthology.org/2020.calcs-1.1/
%P 1-8

Download as File

Markdown (Informal)

[An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines](https://aclanthology.org/2020.calcs-1.1/) (Alvarez-Mellado, CALCS 2020)

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines (Alvarez-Mellado, CALCS 2020)

ACL

Elena Alvarez-Mellado. 2020. An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching, pages 1–8, Marseille, France. European Language Resources Association.