82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, Sara Stymne


Abstract
We present the Uppsala system for the CoNLL 2018 Shared Task on universal dependency parsing. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of-speech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for one language or closely related languages, greatly reducing the number of models. On the official test run, we ranked 7th of 27 teams for the LAS and MLAS metrics. Our system obtained the best scores overall for word segmentation, universal POS tagging, and morphological features.
Anthology ID:
K18-2011
Original:
K18-2011v1
Version 2:
K18-2011v2
Volume:
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Daniel Zeman, Jan Hajič
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–123
Language:
URL:
https://aclanthology.org/K18-2011
DOI:
10.18653/v1/K18-2011
Bibkey:
Cite (ACL):
Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, and Sara Stymne. 2018. 82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 113–123, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models (Smith et al., CoNLL 2018)
Copy Citation:
PDF:
https://aclanthology.org/K18-2011.pdf