RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction

Jinbo Su; Lingzhe Gao; Wei Li; Shihao Liu; Haojie Lei; Xinyi Wang; Yuanzhao Guo; Ke Wang; Daiting Shi; Dawei Yin

doi:10.18653/v1/2025.findings-emnlp.36

RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction

Jinbo Su, Lingzhe Gao, Wei Li, Shihao Liu, Haojie Lei, Xinyi Wang, Yuanzhao Guo, Ke Wang, Daiting Shi, Dawei Yin

Abstract

In web search scenarios, erroneous queries frequently degrade users’ experience through irrelevant results, underscoring the pivotal role of Chinese Spelling Check (CSC) systems. Although large language models (LLMs) exhibit remarkable capabilities across many tasks, they face critical challenges in the CSC scenario: (1) poor generalization to rare entities in open-domain searches, and (2) failure to adapt to temporal entity variations due to static parameters, resulting in serious over-correction issues. To tackle this, we present RACQC, a **C**hinese **Q**uery **C**orrection system with **R**etrieval-**A**ugmented Generation(RAG) and multi-task learning. Specifically, our approach (1) integrates dynamic knowledge retrieval through entity-centric RAG to address rare entities and innovatively proposes an entity-title collaborative corpus, and (2) employs contrastive correction tasks to mitigate LLM over-correction tendencies. Furthermore, we propose MDCQC, a **M**ulti-**D**omain **C**hinese **Q**uery **C**orrection benchmark to test the model’s entity correction capabilities. Extensive experiments on several datasets show that RACQC significantly outperforms existing baselines in CSC tasks. Specifically, RACQC achieves a maximum improvement of +9.92% on the search scenario benchmark and +3.2% on the general-domain dataset under the F₁ metric.

Anthology ID:: 2025.findings-emnlp.36
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 675–689
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.36/
DOI:: 10.18653/v1/2025.findings-emnlp.36
Bibkey:
Cite (ACL):: Jinbo Su, Lingzhe Gao, Wei Li, Shihao Liu, Haojie Lei, Xinyi Wang, Yuanzhao Guo, Ke Wang, Daiting Shi, and Dawei Yin. 2025. RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 675–689, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction (Su et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.36.pdf
Checklist:: 2025.findings-emnlp.36.checklist.pdf

PDF Cite Search Checklist Fix data