Adaptive Reinforcement Tuning Language Models as Hard Data Generators for Sentence Representation

Bo Xu, Yifei Wu, Shouang Wei, Ming Du, Hongya Wang


Abstract
Sentence representation learning is a fundamental task in NLP. Existing methods use contrastive learning (CL) to learn effective sentence representations, which benefit from high-quality contrastive data but require extensive human annotation. Large language models (LLMs) like ChatGPT and GPT4 can automatically generate such data. However, this alternative strategy also encounters challenges: 1) obtaining high-quality generated data from small-parameter LLMs is difficult, and 2) inefficient utilization of the generated data. To address these challenges, we propose a novel adaptive reinforcement tuning (ART) framework. Specifically, to address the first challenge, we introduce a reinforcement learning approach for fine-tuning small-parameter LLMs, enabling the generation of high-quality hard contrastive data without human feedback. To address the second challenge, we propose an adaptive iterative framework to guide the small-parameter LLMs to generate progressively harder samples through multiple iterations, thereby maximizing the utility of generated data. Experiments conducted on seven semantic text similarity tasks demonstrate that the sentence representation models trained using the synthetic data generated by our proposed method achieve state-of-the-art performance. Our code is available at https://github.com/WuNein/AdaptCL.
Anthology ID:
2024.lrec-main.33
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
358–371
Language:
URL:
https://aclanthology.org/2024.lrec-main.33
DOI:
Bibkey:
Cite (ACL):
Bo Xu, Yifei Wu, Shouang Wei, Ming Du, and Hongya Wang. 2024. Adaptive Reinforcement Tuning Language Models as Hard Data Generators for Sentence Representation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 358–371, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Adaptive Reinforcement Tuning Language Models as Hard Data Generators for Sentence Representation (Xu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.33.pdf