Generating Diverse Training Samples for Relation Extraction with Large Language Models

Zexuan Li; Hongliang Dai; Piji Li (李丕绩)

doi:10.18653/v1/2025.acl-long.35

Generating Diverse Training Samples for Relation Extraction with Large Language Models

Abstract

Using Large Language Models (LLMs) to generate training data can potentially be a preferable way to improve zero or few-shot NLP tasks. However, many problems remain to be investigated for this direction. For the task of Relation Extraction (RE), we find that samples generated by directly prompting LLMs may easily have high structural similarities with each other. They tend to use a limited variety of phrasing while expressing the relation between a pair of entities. Therefore, in this paper, we study how to effectively improve the diversity of the training samples generated with LLMs for RE, while also maintaining their correctness. We first try to make the LLMs produce dissimilar samples by directly giving instructions in In-Context Learning (ICL) prompts. Then, we propose an approach to fine-tune LLMs for diversity training sample generation through Direct Preference Optimization (DPO). Our experiments on commonly used RE datasets show that both attempts can improve the quality of the generated training data. We also find that comparing with directly performing RE with an LLM, training a non-LLM RE model with its generated samples may lead to better performance.

Anthology ID:: 2025.acl-long.35
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 713–726
Language:
URL:: https://aclanthology.org/2025.acl-long.35/
DOI:: 10.18653/v1/2025.acl-long.35
Bibkey:
Cite (ACL):: Zexuan Li, Hongliang Dai, and Piji Li. 2025. Generating Diverse Training Samples for Relation Extraction with Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 713–726, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Generating Diverse Training Samples for Relation Extraction with Large Language Models (Li et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.35.pdf

PDF Cite Search Fix data