ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution

Xuanming Zhang; Zixun Chen; Zhou Yu

doi:10.18653/v1/2024.findings-acl.502

ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution

Abstract

Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task — language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems’ ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.

Anthology ID:: 2024.findings-acl.502
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8475–8493
Language:
URL:: https://aclanthology.org/2024.findings-acl.502/
DOI:: 10.18653/v1/2024.findings-acl.502
Bibkey:
Cite (ACL):: Xuanming Zhang, Zixun Chen, and Zhou Yu. 2024. ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution. In Findings of the Association for Computational Linguistics: ACL 2024, pages 8475–8493, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution (Zhang et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.502.pdf

PDF Cite Search Fix data