Hypothesis Testing based Intrinsic Evaluation of Word Embeddings

Nishant Gurnani

doi:10.18653/v1/W17-5303

Hypothesis Testing based Intrinsic Evaluation of Word Embeddings

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring the distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations.

Anthology ID:: W17-5303
Volume:: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Editors:: Samuel R. Bowman, Yoav Goldberg, Felix Hill, Angeliki Lazaridou, Omer Levy, Roi Reichart, Anders Søgaard
Venue:: RepEval
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16–20
Language:
URL:: https://aclanthology.org/W17-5303/
DOI:: 10.18653/v1/W17-5303
Bibkey:
Cite (ACL):: Nishant Gurnani. 2017. Hypothesis Testing based Intrinsic Evaluation of Word Embeddings. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pages 16–20, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Hypothesis Testing based Intrinsic Evaluation of Word Embeddings (Gurnani, RepEval 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-5303.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{gurnani-2017-hypothesis,
    title = "Hypothesis Testing based Intrinsic Evaluation of Word Embeddings",
    author = "Gurnani, Nishant",
    editor = "Bowman, Samuel R.  and
      Goldberg, Yoav  and
      Hill, Felix  and
      Lazaridou, Angeliki  and
      Levy, Omer  and
      Reichart, Roi  and
      S{\o}gaard, Anders",
    booktitle = "Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for {NLP}",
    month = sep,
    year = "2017",
    address = "Copenhagen, Denmark",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-5303/",
    doi = "10.18653/v1/W17-5303",
    pages = "16--20",
    abstract = "We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring the distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="gurnani-2017-hypothesis">
    <titleInfo>
        <title>Hypothesis Testing based Intrinsic Evaluation of Word Embeddings</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Nishant</namePart>
        <namePart type="family">Gurnani</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-09</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Samuel</namePart>
            <namePart type="given">R</namePart>
            <namePart type="family">Bowman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yoav</namePart>
            <namePart type="family">Goldberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Felix</namePart>
            <namePart type="family">Hill</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Angeliki</namePart>
            <namePart type="family">Lazaridou</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Omer</namePart>
            <namePart type="family">Levy</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Roi</namePart>
            <namePart type="family">Reichart</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anders</namePart>
            <namePart type="family">Søgaard</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Copenhagen, Denmark</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring the distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations.</abstract>
    <identifier type="citekey">gurnani-2017-hypothesis</identifier>
    <identifier type="doi">10.18653/v1/W17-5303</identifier>
    <location>
        <url>https://aclanthology.org/W17-5303/</url>
    </location>
    <part>
        <date>2017-09</date>
        <extent unit="page">
            <start>16</start>
            <end>20</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Hypothesis Testing based Intrinsic Evaluation of Word Embeddings
%A Gurnani, Nishant
%Y Bowman, Samuel R.
%Y Goldberg, Yoav
%Y Hill, Felix
%Y Lazaridou, Angeliki
%Y Levy, Omer
%Y Reichart, Roi
%Y Søgaard, Anders
%S Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
%D 2017
%8 September
%I Association for Computational Linguistics
%C Copenhagen, Denmark
%F gurnani-2017-hypothesis
%X We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring the distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations.
%R 10.18653/v1/W17-5303
%U https://aclanthology.org/W17-5303/
%U https://doi.org/10.18653/v1/W17-5303
%P 16-20

Download as File

Markdown (Informal)

[Hypothesis Testing based Intrinsic Evaluation of Word Embeddings](https://aclanthology.org/W17-5303/) (Gurnani, RepEval 2017)

Hypothesis Testing based Intrinsic Evaluation of Word Embeddings (Gurnani, RepEval 2017)

ACL

Nishant Gurnani. 2017. Hypothesis Testing based Intrinsic Evaluation of Word Embeddings. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pages 16–20, Copenhagen, Denmark. Association for Computational Linguistics.