Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages

Pranaydeep Singh, Orphee De Clercq, Els Lefever


Abstract
This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak
Anthology ID:
2022.sigul-1.23
Volume:
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venue:
SIGUL
SIG:
SIGUL
Publisher:
European Language Resources Association
Note:
Pages:
176–184
Language:
URL:
https://aclanthology.org/2022.sigul-1.23
DOI:
Bibkey:
Cite (ACL):
Pranaydeep Singh, Orphee De Clercq, and Els Lefever. 2022. Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 176–184, Marseille, France. European Language Resources Association.
Cite (Informal):
Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages (Singh et al., SIGUL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigul-1.23.pdf
Code
 pranaydeeps/vyaapak
Data
XNLI