Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn


Abstract
Providing technologies to communities or domains where training data is scarce or protected e.g., for privacy reasons, is becoming increasingly important. To that end, we generalise methods for unsupervised transfer from multiple input models for structured prediction. We show that the means of aggregating over the input models is critical, and that multiplying marginal probabilities of substructures to obtain high-probability structures for distant supervision is substantially better than taking the union of such structures over the input models, as done in prior work. Testing on 18 languages, we demonstrate that the method works in a cross-lingual setting, considering both dependency parsing and part-of-speech structured prediction problems. Our analyses show that the proposed method produces less noisy labels for the distant supervision.
Anthology ID:
2022.naacl-main.149
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2041–2054
Language:
URL:
https://aclanthology.org/2022.naacl-main.149
DOI:
10.18653/v1/2022.naacl-main.149
Bibkey:
Cite (ACL):
Kemal Kurniawan, Lea Frermann, Philip Schulz, and Trevor Cohn. 2022. Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2041–2054, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data (Kurniawan et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.149.pdf
Code
 kmkurn/uxtspwsd