Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

Jan Cegin; Branislav Pecher; Jakub Simko; Ivan Srba; Mária Bieliková; Peter Brusilovsky

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky

Abstract

The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for downstream model fine-tuning. This is useful, especially for low-resource settings. For better augmentations, LLMs are prompted with examples (few-shot scenarios). Yet, the samples are mostly selected randomly, and a comprehensive overview of the effects of other (more ”informed”) sample selection strategies is lacking. In this work, we compare sample selection strategies existing in the few-shot learning literature and investigate their effects in LLM-based textual augmentation in a low-resource setting. We evaluate this on in-distribution and out-of-distribution model performance. Results indicate that while some ”informed” selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.

Anthology ID:: 2025.findings-emnlp.296
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5533–5550
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.296/
DOI:
Bibkey:
Cite (ACL):: Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, and Peter Brusilovsky. 2025. Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5533–5550, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation (Cegin et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.296.pdf
Checklist:: 2025.findings-emnlp.296.checklist.pdf

PDF Cite Search Checklist Fix data