Chunk-based Chinese Spelling Check with Global Optimization

Zuyi Bao, Chen Li, Rui Wang


Abstract
Chinese spelling check is a challenging task due to the characteristics of the Chinese language, such as the large character set, no word boundary, and short word length. On the one hand, most of the previous works only consider corrections with similar character pronunciation or shape, failing to correct visually and phonologically irrelevant typos. On the other hand, pipeline-style architectures are widely adopted to deal with different types of spelling errors in individual modules, which is difficult to optimize. In order to handle these issues, in this work, 1) we extend the traditional confusion sets with semantical candidates to cover different types of errors; 2) we propose a chunk-based framework to correct single-character and multi-character word errors uniformly; and 3) we adopt a global optimization strategy to enable a sentence-level correction selection. The experimental results show that the proposed approach achieves a new state-of-the-art performance on three benchmark datasets, as well as an optical character recognition dataset.
Anthology ID:
2020.findings-emnlp.184
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2031–2040
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.184
DOI:
10.18653/v1/2020.findings-emnlp.184
Bibkey:
Cite (ACL):
Zuyi Bao, Chen Li, and Rui Wang. 2020. Chunk-based Chinese Spelling Check with Global Optimization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2031–2040, Online. Association for Computational Linguistics.
Cite (Informal):
Chunk-based Chinese Spelling Check with Global Optimization (Bao et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.184.pdf