Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation

Kaiyu Huang, Hao Yu, Junpeng Liu, Wei Liu, Jingxiang Cao, Degen Huang


Abstract
Precise information of word boundary can alleviate the problem of lexical ambiguity to improve the performance of natural language processing (NLP) tasks. Thus, Chinese word segmentation (CWS) is a fundamental task in NLP. Due to the development of pre-trained language models (PLM), pre-trained knowledge can help neural methods solve the main problems of the CWS in significant measure. Existing methods have already achieved high performance on several benchmarks (e.g., Bakeoff-2005). However, recent outstanding studies are limited by the small-scale annotated corpus. To further improve the performance of CWS methods based on fine-tuning the PLMs, we propose a novel neural framework, LBGCN, which incorporates a lexicon-based graph convolutional network into the Transformer encoder. Experimental results on five benchmarks and four cross-domain datasets show the lexicon-based graph convolutional network successfully captures the information of candidate words and helps to improve performance on the benchmarks (Bakeoff-2005 and CTB6) and the cross-domain datasets (SIGHAN-2010). Further experiments and analyses demonstrate that our proposed framework effectively models the lexicon to enhance the ability of basic neural frameworks and strengthens the robustness in the cross-domain scenario.
Anthology ID:
2021.findings-emnlp.248
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2908–2917
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.248
DOI:
10.18653/v1/2021.findings-emnlp.248
Bibkey:
Cite (ACL):
Kaiyu Huang, Hao Yu, Junpeng Liu, Wei Liu, Jingxiang Cao, and Degen Huang. 2021. Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2908–2917, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation (Huang et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.248.pdf
Software:
 2021.findings-emnlp.248.Software.zip