Retrieval-Augmented Generation for Large Language Model based Few-shot Chinese Spell Checking

Ming Dong, Zhiwei Cheng, Changyin Luo, Tingting He


Abstract
Large language models (LLMs) are naturally suitable for Chinese spelling check (CSC) task in few-shot scenarios due to their powerful semantic understanding and few-shot learning capabilities. Recent CSC research has begun to use LLMs as foundational models. However, most current datasets are primarily focused on errors generated during the text generation process, with little attention given to errors occurring in the modal conversion process. Furthermore, existing LLM-based CSC methods often rely on fixed prompt samples, which limits the performance of LLMs. Therefore, we propose a framework named RagID (Retrieval-Augment Generation and Iterative Discriminator Strategy). By utilizing semantic-based similarity search and an iterative discriminator mechanism, RagID can provide well-chosen prompt samples and reduce over-correction issues in LLM-based CSC. RagID demonstrates excellent effectiveness in few-shot scenarios. We conducted comprehensive experiments, and the results show that RagID achieves the best performance on dataset that include data from multiple domains and dataset containing modal conversion spelling errors. The dataset and method are available online.
Anthology ID:
2025.coling-main.717
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10767–10780
Language:
URL:
https://aclanthology.org/2025.coling-main.717/
DOI:
Bibkey:
Cite (ACL):
Ming Dong, Zhiwei Cheng, Changyin Luo, and Tingting He. 2025. Retrieval-Augmented Generation for Large Language Model based Few-shot Chinese Spell Checking. In Proceedings of the 31st International Conference on Computational Linguistics, pages 10767–10780, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Retrieval-Augmented Generation for Large Language Model based Few-shot Chinese Spell Checking (Dong et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.717.pdf