Delexicalized Cross-lingual Dependency Parsing for Xibe

He Zhou, Sandra Kübler


Abstract
Manually annotating a treebank is time-consuming and labor-intensive. We conduct delexicalized cross-lingual dependency parsing experiments, where we train the parser on one language and test on our target language. As our test case, we use Xibe, a severely under-resourced Tungusic language. We assume that choosing a closely related language as the source language will provide better results than more distant relatives. However, it is not clear how to determine those closely related languages. We investigate three different methods: choosing the typologically closest language, using LangRank, and choosing the most similar language based on perplexity. We train parsing models on the selected languages using UDify and test on different genres of Xibe data. The results show that languages selected based on typology and perplexity scores outperform those predicted by LangRank; Japanese is the optimal source language. In determining the source language, proximity to the target language is more important than large training sizes. Parsing is also influenced by genre differences, but they have little influence as long as the training data is at least as complex as the target.
Anthology ID:
2021.ranlp-1.182
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1626–1635
Language:
URL:
https://aclanthology.org/2021.ranlp-1.182
DOI:
Bibkey:
Cite (ACL):
He Zhou and Sandra Kübler. 2021. Delexicalized Cross-lingual Dependency Parsing for Xibe. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1626–1635, Held Online. INCOMA Ltd..
Cite (Informal):
Delexicalized Cross-lingual Dependency Parsing for Xibe (Zhou & Kübler, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.182.pdf