Cross-lingual Transfer Learning with Persian

Sepideh Mollanorozy, Marc Tanti, Malvina Nissim


Abstract
The success of cross-lingual transfer learning for POS tagging has been shown to be strongly dependent, among other factors, on the (typological and/or genetic) similarity of the low-resource language used for testing and the language(s) used in pre-training or to fine-tune the model. We further unpack this finding in two directions by zooming in on a single language, namely Persian. First, still focusing on POS tagging we run an in-depth analysis of the behaviour of Persian with respect to closely related languages and languages that appear to benefit from cross-lingual transfer with Persian. To do so, we also use the World Atlas of Language Structures to determine which properties are shared between Persian and other languages included in the experiments. Based on our results, Persian seems to be a reasonable potential language for Kurmanji and Tagalog low-resource languages for other tasks as well. Second, we test whether previous findings also hold on a task other than POS tagging to pull apart the benefit of language similarity and the specific task for which such benefit has been shown to hold. We gather sentiment analysis datasets for 31 target languages and through a series of cross-lingual experiments analyse which languages most benefit from Persian as the source. The set of languages that benefit from Persian had very little overlap across the two tasks, suggesting a strong task-dependent component in the usefulness of language similarity in cross-lingual transfer.
Anthology ID:
2023.sigtyp-1.9
Volume:
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Lisa Beinborn, Koustava Goswami, Saliha Muradoğlu, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Edoardo M. Ponti, Ryan Cotterell, Ekaterina Vylomova
Venue:
SIGTYP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
89–95
Language:
URL:
https://aclanthology.org/2023.sigtyp-1.9
DOI:
10.18653/v1/2023.sigtyp-1.9
Bibkey:
Cite (ACL):
Sepideh Mollanorozy, Marc Tanti, and Malvina Nissim. 2023. Cross-lingual Transfer Learning with Persian. In Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 89–95, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Cross-lingual Transfer Learning with Persian (Mollanorozy et al., SIGTYP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigtyp-1.9.pdf
Video:
 https://aclanthology.org/2023.sigtyp-1.9.mp4