Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension

Lin Jieyu, Chen Honghua, Ding Nai


Abstract
“It is a fundamental challenge to evaluate whether a model can truly capture the meaning ofsentences. Evaluation of whether a model well captures the meaning of individual words, how-ever, can be effectively achieved by analyzing whether the model encodes words in a vectorspace where semantically similar words form clusters. Inspired by this approach, we propose theSentence-Space Metrics (SSM) to evaluate model interpretation of sentences, and the sentencespace is constructed based on the pairwise entailment relationships between all sentence pairswithin a sentence pool. We use three metrics to evaluate a sentence space, i.e., (1) sparsity, (2)clustering of related sentences, and (3) similarity with the sentence space measured from hu-mans. The SSM is applied to evaluate 20 models, including ChatGPT, 18 BERT-family modelsfine-tuned for Natural Language Inference (NLI) task, as well as SimCSE, a sentence representa-tion model. The SSM reveals dramatic differences among models: Although all models achievehigh accuracy on standard NLI datasets such as MNLI, none of them mirrors the human behaviorunder the SSM. These results demonstrate that, compared with traditional accuracy measures,the SSM considers pairwise relationships between hundreds of sentences and therefore providea more fine-grained evaluation of model interpretation of sentences.Introduction”
Anthology ID:
2024.ccl-1.103
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1334–1350
Language:
English
URL:
https://aclanthology.org/2024.ccl-1.103/
DOI:
Bibkey:
Cite (ACL):
Lin Jieyu, Chen Honghua, and Ding Nai. 2024. Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1334–1350, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension (Jieyu et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.103.pdf