PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction

Shulin Liu, Tao Yang, Tianchi Yue, Feng Zhang, Di Wang


Abstract
Chinese spelling correction (CSC) is a task to detect and correct spelling errors in texts. CSC is essentially a linguistic problem, thus the ability of language understanding is crucial to this task. In this paper, we propose a Pre-trained masked Language model with Misspelled knowledgE (PLOME) for CSC, which jointly learns how to understand language and correct spelling errors. To this end, PLOME masks the chosen tokens with similar characters according to a confusion set rather than the fixed token “[MASK]” as in BERT. Besides character prediction, PLOME also introduces pronunciation prediction to learn the misspelled knowledge on phonic level. Moreover, phonological and visual similarity knowledge is important to this task. PLOME utilizes GRU networks to model such knowledge based on characters’ phonics and strokes. Experiments are conducted on widely used benchmarks. Our method achieves superior performance against state-of-the-art approaches by a remarkable margin. We release the source code and pre-trained model for further use by the community (https://github.com/liushulinle/PLOME).
Anthology ID:
2021.acl-long.233
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2991–3000
Language:
URL:
https://aclanthology.org/2021.acl-long.233
DOI:
10.18653/v1/2021.acl-long.233
Bibkey:
Cite (ACL):
Shulin Liu, Tao Yang, Tianchi Yue, Feng Zhang, and Di Wang. 2021. PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2991–3000, Online. Association for Computational Linguistics.
Cite (Informal):
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction (Liu et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.233.pdf
Video:
 https://aclanthology.org/2021.acl-long.233.mp4
Code
 liushulinle/plome