Chinese and English Elementary Discourse Units Segmentation based on Bi-LSTM-CRF Model

Li Yancui, Lai Chunxiao, Feng Jike, Feng Hongyu


Abstract
Elementary Discourse Unit (EDU) recognition is the basic task of discourse analysis, and the Chinese and English discourse alignment corpus is helpful to the studies of EDU recognition. This paper first builds Chinese-English parallel discourse corpus, in which EDUs are annotated and aligned. Then, we present the framework of Bi-LSTM-CRF EDUs recognition model using word embedding, POS and syntactic features, which can combine the advantage of CRF and Bi-LSTM. The results show that F1 is about 2% higher than the traditional method. Compared with CRF and Bi-LSTM, the Bi-LSTM-CRF model can combine the advantages of them and obtains satisfactory results for Chinese and English EDUs recognition. The experiment of feature contribution shows that using all features together can get best result, the syntactic feature outperforms than other features.
Anthology ID:
2020.ccl-1.99
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Editors:
Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1068–1078
Language:
English
URL:
https://aclanthology.org/2020.ccl-1.99
DOI:
Bibkey:
Cite (ACL):
Li Yancui, Lai Chunxiao, Feng Jike, and Feng Hongyu. 2020. Chinese and English Elementary Discourse Units Segmentation based on Bi-LSTM-CRF Model. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 1068–1078, Haikou, China. Chinese Information Processing Society of China.
Cite (Informal):
Chinese and English Elementary Discourse Units Segmentation based on Bi-LSTM-CRF Model (Yancui et al., CCL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ccl-1.99.pdf