Revisiting Tri-training of Dependency Parsers

Joachim Wagner, Jennifer Foster


Abstract
We compare two orthogonal semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing. We explore language-specific FastText and ELMo embeddings and multilingual BERT embeddings. We focus on a low resource scenario as semi-supervised learning can be expected to have the most impact here. Based on treebank size and available ELMo models, we select Hungarian, Uyghur (a zero-shot language for mBERT) and Vietnamese. Furthermore, we include English in a simulated low-resource setting. We find that pretrained word embeddings make more effective use of unlabelled data than tri-training but that the two approaches can be successfully combined.
Anthology ID:
2021.emnlp-main.745
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9457–9473
Language:
URL:
https://aclanthology.org/2021.emnlp-main.745
DOI:
10.18653/v1/2021.emnlp-main.745
Bibkey:
Cite (ACL):
Joachim Wagner and Jennifer Foster. 2021. Revisiting Tri-training of Dependency Parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9457–9473, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Revisiting Tri-training of Dependency Parsers (Wagner & Foster, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.745.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.745.mp4
Code
 jowagner/mtb-tri-training +  additional community code