Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Bin Wang; C.-C. Jay Kuo; Haizhou Li

doi:10.18653/v1/2022.acl-long.419

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Abstract

Word and sentence embeddings are useful feature representations in natural language processing. However, intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade. Word and sentence similarity tasks have become the de facto evaluation method. It leads models to overfit to such evaluations, negatively impacting embedding models’ development. This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations. Further, we propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks. Extensive experiments are conducted based on 60+ models and popular datasets to certify our judgments. Finally, the practical evaluation toolkit is released for future benchmarking purposes.

Anthology ID:: 2022.acl-long.419
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6060–6077
Language:
URL:: https://aclanthology.org/2022.acl-long.419
DOI:: 10.18653/v1/2022.acl-long.419
Bibkey:
Cite (ACL):: Bin Wang, C.-C. Jay Kuo, and Haizhou Li. 2022. Just Rank: Rethinking Evaluation with Word and Sentence Similarities. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6060–6077, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Just Rank: Rethinking Evaluation with Word and Sentence Similarities (Wang et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.419.pdf
Video:: https://aclanthology.org/2022.acl-long.419.mp4
Code: binwang28/evalrank-embedding-evaluation
Data: GLUE, MPQA Opinion Corpus, SST, SST-2, SST-5, SciCite, SentEval

PDF Cite Search Code Video