What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, Chu-Ren Huang


Abstract
In this paper, we claim that Vector Cosine ― which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models ― can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that ― independently of the adopted parameters ― outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.
Anthology ID:
L16-1723
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4565–4572
Language:
URL:
https://aclanthology.org/L16-1723
DOI:
Bibkey:
Cite (ACL):
Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, and Chu-Ren Huang. 2016. What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4565–4572, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets (Santus et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1723.pdf