LLM-SLM Collaborative Framework of Idiomatic Expression Generation

Hui Gao; Changhao Song; Peng Zhang; Jing Zhang; Chang Yang; Liuxian Ge

LLM-SLM Collaborative Framework of Idiomatic Expression Generation

Hui Gao, Changhao Song, Peng Zhang, Jing Zhang, Chang Yang, Liuxian Ge

Abstract

Idiomatic Expression Generation, which aims to produce idiomatic text from plain text, is a valuable yet challenging NLP task. However, existing methods suffer from the scarcity of parallel data and dependence on high-quality manual annotations. To address this, we propose an iterative LLM-SLM (Large Language Model-Small Language Model) collaborative framework — Auto-IDEA, that replaces human supervision for idiomatic expression data generation. In this self-improving cycle, the LLM constructs parallel corpora (idiomatic and plain text) via bidirectional semantic reconstruction, automatically generating "Locate-Then-Polish" (LTP) annotations; the SLM filters low-quality corpora while continuously enhancing its verification ability through incremental learning. We instantiate Auto-IDEA for Chinese Idiom Polishing (CIP), constructing CIP-200K, a large-scale dataset of 206K parallel sentences with LTP annotations. The Qwen3-8B fine-tuned on CIP-200K achieves a 25.2% absolute Idiom Polishing Accuracy (IPA) improvement over a supervised fine-tuning (SFT) baseline, outperforming DeepSeek-R1 by 6.2%. Extensive experiments (e.g., Chinese idiom cloze tests and English idiom generation tasks) and human evaluations verify the generalization and effectiveness of Auto-IDEA, demonstrating a new pathway for high-quality, annotation-free data generation through LLM-SLM collaboration.

Anthology ID:: 2026.acl-long.555
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12125–12145
Language:
URL:: https://aclanthology.org/2026.acl-long.555/
DOI:
Bibkey:
Cite (ACL):: Hui Gao, Changhao Song, Peng Zhang, Jing Zhang, Chang Yang, and Liuxian Ge. 2026. LLM-SLM Collaborative Framework of Idiomatic Expression Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12125–12145, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LLM-SLM Collaborative Framework of Idiomatic Expression Generation (Gao et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.555.pdf
Checklist:: 2026.acl-long.555.checklist.pdf

PDF Cite Search Checklist Fix data