Deep Contextualized Self-training for Low Resource Dependency Parsing

Guy Rotman, Roi Reichart


Abstract
Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, which is costly and laborious to create. In this paper we propose a self-training algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Self-training (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser’s output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in cross-domain setups, and demonstrate that DCST substantially outperforms traditional self-training as well as recent semi-supervised training methods.1
Anthology ID:
Q19-1044
Volume:
Transactions of the Association for Computational Linguistics, Volume 7
Month:
Year:
2019
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Brian Roark, Ani Nenkova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
695–713
Language:
URL:
https://aclanthology.org/Q19-1044/
DOI:
10.1162/tacl_a_00294
Bibkey:
Cite (ACL):
Guy Rotman and Roi Reichart. 2019. Deep Contextualized Self-training for Low Resource Dependency Parsing. Transactions of the Association for Computational Linguistics, 7:695–713.
Cite (Informal):
Deep Contextualized Self-training for Low Resource Dependency Parsing (Rotman & Reichart, TACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/Q19-1044.pdf
Code
 rotmanguy/DCST
Data
Universal Dependencies