Multi-Source Cross-Lingual Constituency Parsing

Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura


Abstract
Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information. This work further investigates the cross-lingual transfer ability of these models for constituency parsing and focuses on multi-source transfer. Addressing structure and label set diversity problems, we propose the integration of typological features into the parsing model and treebank normalization. We trained the model on eight languages with diverse structures and use transfer parsing for an additional six low-resource languages. The experimental results show that the treebank normalization is essential for cross-lingual transfer performance and the typological features introduce further improvement. As a result, our approach improves the baseline F1 of multi-source transfer by 5 on average.
Anthology ID:
2021.icon-main.41
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
341–346
Language:
URL:
https://aclanthology.org/2021.icon-main.41
DOI:
Bibkey:
Cite (ACL):
Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, and Satoshi Nakamura. 2021. Multi-Source Cross-Lingual Constituency Parsing. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 341–346, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Multi-Source Cross-Lingual Constituency Parsing (Kaing et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.41.pdf
Data
Penn Treebank