Enhancing Structure-aware Encoder with Extremely Limited Data for Graph-based Dependency Parsing

Yuanhe Tian, Yan Song, Fei Xia


Abstract
Dependency parsing is an important fundamental natural language processing task which analyzes the syntactic structure of an input sentence by illustrating the syntactic relations between words. To improve dependency parsing, leveraging existing dependency parsers and extra data (e.g., through semi-supervised learning) has been demonstrated to be effective, even though the final parsers are trained on inaccurate (but massive) data. In this paper, we propose a frustratingly easy approach to improve graph-based dependency parsing, where a structure-aware encoder is pre-trained on auto-parsed data by predicting the word dependencies and then fine-tuned on gold dependency trees, which differs from the usual pre-training process that aims to predict the context words along dependency paths. Experimental results and analyses demonstrate the effectiveness and robustness of our approach to benefit from the data (even with noise) processed by different parsers, where our approach outperforms strong baselines under different settings with different dependency standards and model architectures used in pre-training and fine-tuning. More importantly, further analyses find that only 2K auto-parsed sentences are required to obtain improvement when pre-training vanilla BERT-large based parser without requiring extra parameters.
Anthology ID:
2022.coling-1.483
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5438–5449
Language:
URL:
https://aclanthology.org/2022.coling-1.483
DOI:
Bibkey:
Cite (ACL):
Yuanhe Tian, Yan Song, and Fei Xia. 2022. Enhancing Structure-aware Encoder with Extremely Limited Data for Graph-based Dependency Parsing. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5438–5449, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Enhancing Structure-aware Encoder with Extremely Limited Data for Graph-based Dependency Parsing (Tian et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.483.pdf
Code
 synlp/dmpar
Data
Penn TreebankUniversal Dependencies