Evaluating LLMs’ Capability to Identify Lexical Semantic Equivalence: Probing with the Word-in-Context Task

Yoshihiko Hayashi


Abstract
This study proposes a method to evaluate the capability of large language models (LLMs) in identifying lexical semantic equivalence. The Word-in-Context (WiC) task, a benchmark designed to determine whether the meanings of a target word remain identical across different contexts, is employed as a probing task. Experiments are conducted with several LLMs, including proprietary GPT models and open-source models, using zero-shot prompting with adjectives that represent varying levels of semantic equivalence (e.g., “the same”) or inequivalence (e.g., “different”). The fundamental capability to identify lexical semantic equivalence in context is measured using standard accuracy metrics. Consistency across different levels of semantic equivalence is assessed via rank correlation with the expected canonical ranking of precision and recall, reflecting anticipated trends in performance across prompts. The proposed method demonstrates its effectiveness, highlighting the superior capability of GPT-4o, as it consistently outperforms other explored LLMs. Analysis of the WiC dataset, the discriminative properties of adjectives (i.e., their ability to differentiate between levels of semantic equivalence), and linguistic patterns in erroneous cases offer insights into the LLM’s capability and sensitivity. These findings could inform improvements in WiC task performance, although performance enhancement is not the primary focus of this study.
Anthology ID:
2025.coling-main.466
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6985–6998
Language:
URL:
https://aclanthology.org/2025.coling-main.466/
DOI:
Bibkey:
Cite (ACL):
Yoshihiko Hayashi. 2025. Evaluating LLMs’ Capability to Identify Lexical Semantic Equivalence: Probing with the Word-in-Context Task. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6985–6998, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Evaluating LLMs’ Capability to Identify Lexical Semantic Equivalence: Probing with the Word-in-Context Task (Hayashi, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.466.pdf