Toward the Limitation of Code-Switching in Cross-Lingual Transfer

Yukun Feng, Feng Li, Philipp Koehn


Abstract
Multilingual pretrained models have shown strong cross-lingual transfer ability. Some works used code-switching sentences, which consist of tokens from multiple languages, to enhance the cross-lingual representation further, and have shown success in many zero-shot cross-lingual tasks. However, code-switched tokens are likely to cause grammatical incoherence in newly substituted sentences, and negatively affect the performance on token-sensitive tasks, such as Part-of-Speech (POS) tagging and Named-Entity-Recognition (NER). This paper mitigates the limitation of the code-switching method by not only making the token replacement but considering the similarity between the context and the switched tokens so that the newly substituted sentences are grammatically consistent during both training and inference. We conduct experiments on cross-lingual POS and NER over 30+ languages, and demonstrate the effectiveness of our method by outperforming the mBERT by 0.95 and original code-switching method by 1.67 on F1 scores.
Anthology ID:
2022.emnlp-main.400
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5966–5971
Language:
URL:
https://aclanthology.org/2022.emnlp-main.400
DOI:
10.18653/v1/2022.emnlp-main.400
Bibkey:
Cite (ACL):
Yukun Feng, Feng Li, and Philipp Koehn. 2022. Toward the Limitation of Code-Switching in Cross-Lingual Transfer. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5966–5971, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Toward the Limitation of Code-Switching in Cross-Lingual Transfer (Feng et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.400.pdf