Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality

Thomas Pickard

Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.

Anthology ID:: 2020.mwe-1.12
Volume:: Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:: December
Year:: 2020
Address:: online
Editors:: Stella Markantonatou, John McCrae, Jelena Mitrović, Carole Tiberius, Carlos Ramisch, Ashwini Vaidya, Petya Osenova, Agata Savary
Venue:: MWE
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–100
Language:
URL:: https://aclanthology.org/2020.mwe-1.12/
DOI:
Bibkey:
Cite (ACL):: Thomas Pickard. 2020. Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 95–100, online. Association for Computational Linguistics.
Cite (Informal):: Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality (Pickard, MWE 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.mwe-1.12.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{pickard-2020-comparing,
    title = "Comparing word2vec and {G}lo{V}e for Automatic Measurement of {MWE} Compositionality",
    author = "Pickard, Thomas",
    editor = "Markantonatou, Stella  and
      McCrae, John  and
      Mitrovi{\'c}, Jelena  and
      Tiberius, Carole  and
      Ramisch, Carlos  and
      Vaidya, Ashwini  and
      Osenova, Petya  and
      Savary, Agata",
    booktitle = "Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons",
    month = dec,
    year = "2020",
    address = "online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.mwe-1.12/",
    pages = "95--100",
    abstract = "This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10{\%} of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="pickard-2020-comparing">
    <titleInfo>
        <title>Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Thomas</namePart>
        <namePart type="family">Pickard</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Stella</namePart>
            <namePart type="family">Markantonatou</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">John</namePart>
            <namePart type="family">McCrae</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jelena</namePart>
            <namePart type="family">Mitrović</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Carole</namePart>
            <namePart type="family">Tiberius</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Carlos</namePart>
            <namePart type="family">Ramisch</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ashwini</namePart>
            <namePart type="family">Vaidya</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Petya</namePart>
            <namePart type="family">Osenova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Agata</namePart>
            <namePart type="family">Savary</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.</abstract>
    <identifier type="citekey">pickard-2020-comparing</identifier>
    <location>
        <url>https://aclanthology.org/2020.mwe-1.12/</url>
    </location>
    <part>
        <date>2020-12</date>
        <extent unit="page">
            <start>95</start>
            <end>100</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality
%A Pickard, Thomas
%Y Markantonatou, Stella
%Y McCrae, John
%Y Mitrović, Jelena
%Y Tiberius, Carole
%Y Ramisch, Carlos
%Y Vaidya, Ashwini
%Y Osenova, Petya
%Y Savary, Agata
%S Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
%D 2020
%8 December
%I Association for Computational Linguistics
%C online
%F pickard-2020-comparing
%X This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.
%U https://aclanthology.org/2020.mwe-1.12/
%P 95-100

Download as File

Markdown (Informal)

[Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality](https://aclanthology.org/2020.mwe-1.12/) (Pickard, MWE 2020)

Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality (Pickard, MWE 2020)

ACL

Thomas Pickard. 2020. Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 95–100, online. Association for Computational Linguistics.