QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, Shuwang Han


Abstract
This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.
Anthology ID:
S17-2020
Volume:
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Steven Bethard, Marine Carpuat, Marianna Apidianaki, Saif M. Mohammad, Daniel Cer, David Jurgens
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
150–153
Language:
URL:
https://aclanthology.org/S17-2020
DOI:
10.18653/v1/S17-2020
Bibkey:
Cite (ACL):
Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, and Shuwang Han. 2017. QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 150–153, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings (Meng et al., SemEval 2017)
Copy Citation:
PDF:
https://aclanthology.org/S17-2020.pdf