QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

Fanqing Meng; Wenpeng Lu; Yuteng Zhang; Jinyong Cheng; Yuehan Du; Shuwang Han

doi:10.18653/v1/S17-2020

QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, Shuwang Han

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.

Anthology ID:: S17-2020
Volume:: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Steven Bethard, Marine Carpuat, Marianna Apidianaki, Saif M. Mohammad, Daniel Cer, David Jurgens
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 150–153
Language:
URL:: https://aclanthology.org/S17-2020/
DOI:: 10.18653/v1/S17-2020
Bibkey:
Cite (ACL):: Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, and Shuwang Han. 2017. QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 150–153, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings (Meng et al., SemEval 2017)
Copy Citation:
PDF:: https://aclanthology.org/S17-2020.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{meng-etal-2017-qlut,
    title = "{QLUT} at {S}em{E}val-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings",
    author = "Meng, Fanqing  and
      Lu, Wenpeng  and
      Zhang, Yuteng  and
      Cheng, Jinyong  and
      Du, Yuehan  and
      Han, Shuwang",
    editor = "Bethard, Steven  and
      Carpuat, Marine  and
      Apidianaki, Marianna  and
      Mohammad, Saif M.  and
      Cer, Daniel  and
      Jurgens, David",
    booktitle = "Proceedings of the 11th International Workshop on Semantic Evaluation ({S}em{E}val-2017)",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/S17-2020/",
    doi = "10.18653/v1/S17-2020",
    pages = "150--153",
    abstract = "This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="meng-etal-2017-qlut">
    <titleInfo>
        <title>QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Fanqing</namePart>
        <namePart type="family">Meng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Wenpeng</namePart>
        <namePart type="family">Lu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yuteng</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jinyong</namePart>
        <namePart type="family">Cheng</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yuehan</namePart>
        <namePart type="family">Du</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shuwang</namePart>
        <namePart type="family">Han</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Steven</namePart>
            <namePart type="family">Bethard</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marine</namePart>
            <namePart type="family">Carpuat</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marianna</namePart>
            <namePart type="family">Apidianaki</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Saif</namePart>
            <namePart type="given">M</namePart>
            <namePart type="family">Mohammad</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Daniel</namePart>
            <namePart type="family">Cer</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Jurgens</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Vancouver, Canada</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.</abstract>
    <identifier type="citekey">meng-etal-2017-qlut</identifier>
    <identifier type="doi">10.18653/v1/S17-2020</identifier>
    <location>
        <url>https://aclanthology.org/S17-2020/</url>
    </location>
    <part>
        <date>2017-08</date>
        <extent unit="page">
            <start>150</start>
            <end>153</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings
%A Meng, Fanqing
%A Lu, Wenpeng
%A Zhang, Yuteng
%A Cheng, Jinyong
%A Du, Yuehan
%A Han, Shuwang
%Y Bethard, Steven
%Y Carpuat, Marine
%Y Apidianaki, Marianna
%Y Mohammad, Saif M.
%Y Cer, Daniel
%Y Jurgens, David
%S Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
%D 2017
%8 August
%I Association for Computational Linguistics
%C Vancouver, Canada
%F meng-etal-2017-qlut
%X This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.
%R 10.18653/v1/S17-2020
%U https://aclanthology.org/S17-2020/
%U https://doi.org/10.18653/v1/S17-2020
%P 150-153

Download as File

Markdown (Informal)

[QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings](https://aclanthology.org/S17-2020/) (Meng et al., SemEval 2017)

QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings (Meng et al., SemEval 2017)

ACL

Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, and Shuwang Han. 2017. QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 150–153, Vancouver, Canada. Association for Computational Linguistics.