TGIF: Tree-Graph Integrated-Format Parser for Enhanced UD with Two-Stage Generic- to Individual-Language Finetuning

Tianze Shi, Lillian Lee


Abstract
We present our contribution to the IWPT 2021 shared task on parsing into enhanced Universal Dependencies. Our main system component is a hybrid tree-graph parser that integrates (a) predictions of spanning trees for the enhanced graphs with (b) additional graph edges not present in the spanning trees. We also adopt a finetuning strategy where we first train a language-generic parser on the concatenation of data from all available languages, and then, in a second step, finetune on each individual language separately. Additionally, we develop our own complete set of pre-processing modules relevant to the shared task, including tokenization, sentence segmentation, and multiword token expansion, based on pre-trained XLM-R models and our own pre-training of character-level language models. Our submission reaches a macro-average ELAS of 89.24 on the test set. It ranks top among all teams, with a margin of more than 2 absolute ELAS over the next best-performing submission, and best score on 16 out of 17 languages.
Anthology ID:
2021.iwpt-1.23
Volume:
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | IWPT
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
213–224
Language:
URL:
https://aclanthology.org/2021.iwpt-1.23
DOI:
10.18653/v1/2021.iwpt-1.23
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.iwpt-1.23.pdf