Uncertainty-Aware Cross-Lingual Transfer with Pseudo Partial Labels

Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Chang-Tien Lu


Abstract
Large-scale multilingual pre-trained language models have achieved remarkable performance in zero-shot cross-lingual tasks. A recent study has demonstrated the effectiveness of self-learning-based approach on cross-lingual transfer, where only unlabeled data of target languages are required, without any efforts to annotate gold labels for target languages. However, it suffers from noisy training due to the incorrectly pseudo-labeled samples. In this work, we propose an uncertainty-aware Cross-Lingual Transfer framework with Pseudo-Partial-Label (CLTP)1 to maximize the utilization of unlabeled data by reducing the noise introduced in the training phase. To estimate pseudo-partial-label for each unlabeled data, we propose a novel estimation method, considering both prediction confidence and the limitation to the number of similar labels. Extensive experiments are conducted on two cross-lingual tasks, including Named Entity Recognition (NER) and Natural Language Inference (NLI) across 40 languages, which shows our method can outperform the baselines on both high-resource and low-resource languages, such as 6.9 on Kazakh (kk) and 5.2 Marathi (mr) for NER.
Anthology ID:
2022.findings-naacl.153
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1987–1997
Language:
URL:
https://aclanthology.org/2022.findings-naacl.153
DOI:
10.18653/v1/2022.findings-naacl.153
Bibkey:
Cite (ACL):
Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, and Chang-Tien Lu. 2022. Uncertainty-Aware Cross-Lingual Transfer with Pseudo Partial Labels. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1987–1997, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Uncertainty-Aware Cross-Lingual Transfer with Pseudo Partial Labels (Lei et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.153.pdf
Software:
 2022.findings-naacl.153.software.zip
Video:
 https://aclanthology.org/2022.findings-naacl.153.mp4
Code
 slei109/cltp
Data
XTREME