Why is sentence similarity benchmark not predictive of application-oriented task performance?

Why is sentence similarity benchmark not predictive of application-oriented task performance? Kaori Abe author Sho Yokoi author Tomoyuki Kajiwara author Kentaro Inui author 2022-11 text Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems Daniel Deutsch editor Can Udomcharoenchaikit editor Juri Opitz editor Yang Gao editor Marina Fomicheva editor Steffen Eger editor Association for Computational Linguistics Online conference publication abe-etal-2022-sentence 10.18653/v1/2022.eval4nlp-1.8 https://aclanthology.org/2022.eval4nlp-1.8/ 2022-11 70 87