From Selection to Generation: A Survey of LLM-based Active Learning

Yu Xia; Subhojyoti Mukherjee; Zhouhang Xie; Junda Wu; Xintong Li; Ryan Aponte; Hanjia Lyu; Joe Barrow; Hongjie Chen; Franck Dernoncourt; Branislav Kveton; Tong Yu; Ruiyi Zhang; Jiuxiang Gu; Nesreen K. Ahmed; Yu Wang; Xiang Chen; Hanieh Deilamsalehy; Sungchul Kim; Zhengmian Hu; Yue Zhao; Nedim Lipka; Seunghyun Yoon; Ting-Hao Huang; Zichao Wang; Puneet Mathur; Soumyabrata Pal; Koyel Mukherjee; Zhehao Zhang; Namyong Park; Thien Huu Nguyen; Jiebo Luo; Ryan A. Rossi; Julian McAuley

doi:10.18653/v1/2025.acl-long.708

From Selection to Generation: A Survey of LLM-based Active Learning

Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao Kenneth Huang, Zichao Wang, Puneet Mathur, Soumyabrata Pal, Koyel Mukherjee, Zhehao Zhang, Namyong Park, Thien Huu Nguyen, Jiebo Luo, Ryan A. Rossi, Julian McAuley

Abstract

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.

Anthology ID:: 2025.acl-long.708
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14552–14569
Language:
URL:: https://aclanthology.org/2025.acl-long.708/
DOI:: 10.18653/v1/2025.acl-long.708
Bibkey:
Cite (ACL):: Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao Kenneth Huang, Zichao Wang, Puneet Mathur, Soumyabrata Pal, Koyel Mukherjee, Zhehao Zhang, Namyong Park, Thien Huu Nguyen, Jiebo Luo, Ryan A. Rossi, and Julian McAuley. 2025. From Selection to Generation: A Survey of LLM-based Active Learning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14552–14569, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: From Selection to Generation: A Survey of LLM-based Active Learning (Xia et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.708.pdf

PDF Cite Search Fix data