Overcoming Early Saturation on Low-Resource Languages in Multilingual Dependency Parsing

Jiannan Mao, Chenchen Ding, Hour Kaing, Hideki Tanaka, Masao Utiyama, Tadahiro Matsumoto.


Abstract
UDify is a multilingual and multi-task parser fine-tuned on mBERT that achieves remarkable performance in high-resource languages. However, the performance saturates early and decreases gradually in low-resource languages as training proceeds. This work applies a data augmentation method and conducts experiments on seven few-shot and four zero-shot languages. The unlabeled attachment scores were improved on the zero-shot languages dependency parsing tasks, with the average score rising from 67.1% to 68.7%. Meanwhile, dependency parsing tasks for high-resource languages and other tasks were hardly affected. Experimental results indicate the data augmentation method is effective for low-resource languages in a multilingual dependency parsing.
Anthology ID:
2024.mwe-1.10
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Archna Bhatia, Gosse Bouma, A. Seza Doğruöz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademaker
Venues:
MWE | UDW | WS
SIGs:
SIGLEX | SIGPARSE
Publisher:
ELRA and ICCL
Note:
Pages:
63–69
Language:
URL:
https://aclanthology.org/2024.mwe-1.10
DOI:
Bibkey:
Cite (ACL):
Jiannan Mao, Chenchen Ding, Hour Kaing, Hideki Tanaka, Masao Utiyama, and Tadahiro Matsumoto.. 2024. Overcoming Early Saturation on Low-Resource Languages in Multilingual Dependency Parsing. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 63–69, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Overcoming Early Saturation on Low-Resource Languages in Multilingual Dependency Parsing (Mao et al., MWE-UDW-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.mwe-1.10.pdf