TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing

Jenna Kanerva, Juhani Luotolahti, Filip Ginter


Abstract
We present the TurkuNLP entry in the CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. The system is based on the UDPipe parser with our focus being in exploring various techniques to pre-train the word embeddings used by the parser in order to improve its performance especially on languages with small training sets. The system ranked 11th among the 33 participants overall, being 8th on the small treebanks, 10th on the large treebanks, 12th on the parallel test sets, and 26th on the surprise languages.
Anthology ID:
K17-3012
Volume:
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Jan Hajič, Dan Zeman
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–125
Language:
URL:
https://aclanthology.org/K17-3012
DOI:
10.18653/v1/K17-3012
Bibkey:
Cite (ACL):
Jenna Kanerva, Juhani Luotolahti, and Filip Ginter. 2017. TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 119–125, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing (Kanerva et al., CoNLL 2017)
Copy Citation:
PDF:
https://aclanthology.org/K17-3012.pdf