Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM’s Nest

Letian Peng; Zilong Wang; Feng Yao; Jingbo Shang

doi:10.18653/v1/2025.acl-long.66

Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM’s Nest

Letian Peng, Zilong Wang, Feng Yao, Jingbo Shang

Abstract

Massive high-quality data, both pre-training raw texts and post-training annotations, have been carefully prepared to incubate advanced large language models (LLMs). In contrast, for information extraction (IE), pre-training data, such as BIO-tagged sequences, are hard to scale up. We show that IE models can act as free riders on LLM resources by reframing next-token prediction into extraction for tokens already present in the context. Specifically, our proposed next tokens extraction (NTE) paradigm learns a versatile IE model, Cuckoo, with 102.6M extractive data converted from LLM’s pre-training and post-training data. Under the few-shot setting, Cuckoo adapts effectively to traditional and complex instruction-following IE with better performance than existing pre-trained IE models. As a free rider, Cuckoo can naturally evolve with the ongoing advancements in LLM data preparation, benefiting from improvements in LLM training pipelines without additional manual effort.

Anthology ID:: 2025.acl-long.66
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1301–1315
Language:
URL:: https://aclanthology.org/2025.acl-long.66/
DOI:: 10.18653/v1/2025.acl-long.66
Bibkey:
Cite (ACL):: Letian Peng, Zilong Wang, Feng Yao, and Jingbo Shang. 2025. Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM’s Nest. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1301–1315, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM’s Nest (Peng et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.66.pdf

PDF Cite Search Fix data