CILex: An Investigation of Context Information for Lexical Substitution Methods

Sandaru Seneviratne, Elena Daskalaki, Artem Lenskiy, Hanna Suominen


Abstract
Lexical substitution, which aims to generate substitutes for a target word given a context, is an important natural language processing task useful in many applications. Due to the paucity of annotated data, existing methods for lexical substitution tend to rely on manually curated lexical resources and contextual word embedding models. Methods based on lexical resources are likely to miss relevant substitutes whereas relying only on contextual word embedding models fails to provide adequate information on the impact of a substitute in the entire context and the overall meaning of the input. We proposed CILex, which uses contextual sentence embeddings along with methods that capture additional context information complimenting contextual word embeddings for lexical substitution. This ensured the semantic consistency of a substitute with the target word while maintaining the overall meaning of the sentence. Our experimental comparisons with previously proposed methods indicated that our solution is now the state-of-the-art on both the widely used LS07 and CoInCo datasets with P@1 scores of 55.96% and 57.25% for lexical substitution. The implementation of the proposed approach is available at https://github.com/sandaruSen/CILex under the MIT license.
Anthology ID:
2022.coling-1.362
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4124–4135
Language:
URL:
https://aclanthology.org/2022.coling-1.362
DOI:
Bibkey:
Cite (ACL):
Sandaru Seneviratne, Elena Daskalaki, Artem Lenskiy, and Hanna Suominen. 2022. CILex: An Investigation of Context Information for Lexical Substitution Methods. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4124–4135, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
CILex: An Investigation of Context Information for Lexical Substitution Methods (Seneviratne et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.362.pdf
Code
 sandarusen/cilex