A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection

Yiyi Wang, Chilin Shih


Abstract
This paper presents a method of combining Conditional Random Fields (CRFs) model with a post-processing layer using Google n-grams statistical information tailored to detect word selection and word order errors made by learners of Chinese as Foreign Language (CFL). We describe the architecture of the model and its performance in the shared task of the ACL 2018 Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). This hybrid approach yields comparably high false positive rate (FPR = 0.1274) and precision (Pd= 0.7519; Pi= 0.6311), but low recall (Rd = 0.3035; Ri = 0.1696 ) in grammatical error detection and identification tasks. Additional statistical information and linguistic rules can be added to enhance the model performance in the future.
Anthology ID:
W18-3728
Volume:
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Yuen-Hsien Tseng, Hsin-Hsi Chen, Vincent Ng, Mamoru Komachi
Venue:
NLP-TEA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
194–198
Language:
URL:
https://aclanthology.org/W18-3728
DOI:
10.18653/v1/W18-3728
Bibkey:
Cite (ACL):
Yiyi Wang and Chilin Shih. 2018. A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection. In Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pages 194–198, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection (Wang & Shih, NLP-TEA 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3728.pdf