Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions

Kira Droganova, Filip Ginter, Jenna Kanerva, Daniel Zeman


Abstract
In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.
Anthology ID:
W18-6006
Volume:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Marie-Catherine de Marneffe, Teresa Lynn, Sebastian Schuster
Venue:
UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47–54
Language:
URL:
https://aclanthology.org/W18-6006/
DOI:
10.18653/v1/W18-6006
Bibkey:
Cite (ACL):
Kira Droganova, Filip Ginter, Jenna Kanerva, and Daniel Zeman. 2018. Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 47–54, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions (Droganova et al., UDW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6006.pdf
Data
Universal Dependencies