Confusionset-guided Pointer Networks for Chinese Spelling Check

Dingmin Wang, Yi Tay, Li Zhong


Abstract
This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task. More concretely, our approach utilizes the off-the-shelf confusionset for guiding the character generation. To this end, our novel Seq2Seq model jointly learns to copy a correct character from an input sentence through a pointer network, or generate a character from the confusionset rather than the entire vocabulary. We conduct experiments on three human-annotated datasets, and results demonstrate that our proposed generative model outperforms all competitor models by a large margin of up to 20% F1 score, achieving state-of-the-art performance on three datasets.
Anthology ID:
P19-1578
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5780–5785
Language:
URL:
https://aclanthology.org/P19-1578
DOI:
10.18653/v1/P19-1578
Bibkey:
Cite (ACL):
Dingmin Wang, Yi Tay, and Li Zhong. 2019. Confusionset-guided Pointer Networks for Chinese Spelling Check. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5780–5785, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Confusionset-guided Pointer Networks for Chinese Spelling Check (Wang et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1578.pdf
Video:
 https://aclanthology.org/P19-1578.mp4