Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task

Yi Yang, Pengjun Xie, Jun Tao, Guangwei Xu, Linlin Li, Luo Si


Abstract
This paper introduces Alibaba NLP team system on IJCNLP 2017 shared task No. 1 Chinese Grammatical Error Diagnosis (CGED). The task is to diagnose four types of grammatical errors which are redundant words (R), missing words (M), bad word selection (S) and disordered words (W). We treat the task as a sequence tagging problem and design some handcraft features to solve it. Our system is mainly based on the LSTM-CRF model and 3 ensemble strategies are applied to improve the performance. At the identification level and the position level our system gets the highest F1 scores. At the position level, which is the most difficult level, we perform best on all metrics.
Anthology ID:
I17-4006
Volume:
Proceedings of the IJCNLP 2017, Shared Tasks
Month:
December
Year:
2017
Address:
Taipei, Taiwan
Editors:
Chao-Hong Liu, Preslav Nakov, Nianwen Xue
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
41–46
Language:
URL:
https://aclanthology.org/I17-4006
DOI:
Bibkey:
Cite (ACL):
Yi Yang, Pengjun Xie, Jun Tao, Guangwei Xu, Linlin Li, and Luo Si. 2017. Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task. In Proceedings of the IJCNLP 2017, Shared Tasks, pages 41–46, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task (Yang et al., IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-4006.pdf
Data
Alibaba Cluster Trace