Comparative Analysis of Cross-lingual Contextualized Word Embeddings

Hossain Shaikh Saadi, Viktor Hangya, Tobias Eder, Alexander Fraser


Abstract
Contextualized word embeddings have emerged as the most important tool for performing NLP tasks in a large variety of languages. In order to improve the cross- lingual representation and transfer learning quality, contextualized embedding alignment techniques, such as mapping and model fine-tuning, are employed. Existing techniques however are time-, data- and computational resource-intensive. In this paper we analyze these techniques by utilizing three tasks: bilingual lexicon induction (BLI), word retrieval and cross-lingual natural language inference (XNLI) for a high resource (German-English) and a low resource (Bengali-English) language pair. In contrast to previous works which focus only on a few popular models, we compare five multilingual and seven monolingual language models and investigate the effect of various aspects on their performance, such as vocabulary size, number of languages used for training and number of parameters. Additionally, we propose a parameter-, data- and runtime-efficient technique which can be trained with 10% of the data, less than 10% of the time and have less than 5% of the trainable parameters compared to model fine-tuning. We show that our proposed method is competitive with resource heavy models, even outperforming them in some cases, even though it relies on less resource
Anthology ID:
2022.mrl-1.6
Volume:
Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Duygu Ataman, Hila Gonen, Sebastian Ruder, Orhan Firat, Gözde Gül Sahin, Jamshidbek Mirzakhalov
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–75
Language:
URL:
https://aclanthology.org/2022.mrl-1.6
DOI:
10.18653/v1/2022.mrl-1.6
Bibkey:
Cite (ACL):
Hossain Shaikh Saadi, Viktor Hangya, Tobias Eder, and Alexander Fraser. 2022. Comparative Analysis of Cross-lingual Contextualized Word Embeddings. In Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL), pages 64–75, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Comparative Analysis of Cross-lingual Contextualized Word Embeddings (Saadi et al., MRL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mrl-1.6.pdf