Treebank Embedding Vectors for Out-of-Domain Dependency Parsing

Joachim Wagner, James Barry, Jennifer Foster


Abstract
A recent advance in monolingual dependency parsing is the idea of a treebank embedding vector, which allows all treebanks for a particular language to be used as training data while at the same time allowing the model to prefer training data from one treebank over others and to select the preferred treebank at test time. We build on this idea by 1) introducing a method to predict a treebank vector for sentences that do not come from a treebank used in training, and 2) exploring what happens when we move away from predefined treebank embedding vectors during test time and instead devise tailored interpolations. We show that 1) there are interpolated vectors that are superior to the predefined ones, and 2) treebank vectors can be predicted with sufficient accuracy, for nine out of ten test languages, to match the performance of an oracle approach that knows the most suitable predefined treebank embedding for the test set.
Anthology ID:
2020.acl-main.778
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8812–8818
Language:
URL:
https://aclanthology.org/2020.acl-main.778
DOI:
10.18653/v1/2020.acl-main.778
Bibkey:
Cite (ACL):
Joachim Wagner, James Barry, and Jennifer Foster. 2020. Treebank Embedding Vectors for Out-of-Domain Dependency Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8812–8818, Online. Association for Computational Linguistics.
Cite (Informal):
Treebank Embedding Vectors for Out-of-Domain Dependency Parsing (Wagner et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.778.pdf
Video:
 http://slideslive.com/38929016
Code
 jowagner/tbev-prediction
Data
Universal Dependencies