Reusable Phrase Extraction Based on Syntactic Parsing

Xuemin Duan, Zan Hongying, Xiaojing Bai, Christoph Zähner


Abstract
Academic Phrasebank is an important resource for academic writers. Student writers use the phrases of Academic Phrasebank organizing their research article to improve their writing ability. Due to the limited size of Academic Phrasebank, it can not meet all the academic writing needs. There are still a large number of academic phraseology in the authentic research article. In this paper, we proposed an academic phraseology extraction model based on constituency parsing and dependency parsing, which can automatically extract the academic phraseology similar to phrases of Academic Phrasebank from an unlabelled research article. We divided the proposed model into three main components including an academic phraseology corpus module, a sentence simplification module, and a syntactic parsing module. We created a corpus of academic phraseology of 2,129 words to help judge whether a word is neutral and general, and created two datasets under two scenarios to verify the feasibility of the proposed model.
Anthology ID:
2020.ccl-1.108
Original:
2020.ccl-1.108v1
Version 2:
2020.ccl-1.108v2
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Editors:
Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1166–1171
Language:
English
URL:
https://aclanthology.org/2020.ccl-1.108
DOI:
Bibkey:
Cite (ACL):
Xuemin Duan, Zan Hongying, Xiaojing Bai, and Christoph Zähner. 2020. Reusable Phrase Extraction Based on Syntactic Parsing. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 1166–1171, Haikou, China. Chinese Information Processing Society of China.
Cite (Informal):
Reusable Phrase Extraction Based on Syntactic Parsing (Duan et al., CCL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ccl-1.108.pdf