MDCSpell: A Multi-task Detector-Corrector Framework for Chinese Spelling Correction

Chenxi Zhu, Ziqiang Ying, Boyu Zhang, Feng Mao


Abstract
Chinese Spelling Correction (CSC) is a task to detect and correct misspelled characters in Chinese texts. CSC is challenging since many Chinese characters are visually or phonologically similar but with quite different semantic meanings. Many recent works use BERT-based language models to directly correct each character of the input sentence. However, these methods can be sub-optimal since they correct every character of the sentence only by the context which is easily negatively affected by the misspelled characters. Some other works propose to use an error detector to guide the correction by masking the detected errors. Nevertheless, these methods dampen the visual or phonological features from the misspelled characters which could be critical for correction. In this work, we propose a novel general detector-corrector multi-task framework where the corrector uses BERT to capture the visual and phonological features from each character in the raw sentence and uses a late fusion strategy to fuse the hidden states of the corrector with that of the detector to minimize the negative impact from the misspelled characters. Comprehensive experiments on benchmarks demonstrate that our proposed method can significantly outperform the state-of-the-art methods in the CSC task.
Anthology ID:
2022.findings-acl.98
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1244–1253
Language:
URL:
https://aclanthology.org/2022.findings-acl.98
DOI:
10.18653/v1/2022.findings-acl.98
Bibkey:
Cite (ACL):
Chenxi Zhu, Ziqiang Ying, Boyu Zhang, and Feng Mao. 2022. MDCSpell: A Multi-task Detector-Corrector Framework for Chinese Spelling Correction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1244–1253, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
MDCSpell: A Multi-task Detector-Corrector Framework for Chinese Spelling Correction (Zhu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.98.pdf