Language Embeddings for Typology and Cross-lingual Transfer Learning

Dian Yu, Taiqi He, Kenji Sagae


Abstract
Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.
Anthology ID:
2021.acl-long.560
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7210–7225
Language:
URL:
https://aclanthology.org/2021.acl-long.560
DOI:
10.18653/v1/2021.acl-long.560
Bibkey:
Cite (ACL):
Dian Yu, Taiqi He, and Kenji Sagae. 2021. Language Embeddings for Typology and Cross-lingual Transfer Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7210–7225, Online. Association for Computational Linguistics.
Cite (Informal):
Language Embeddings for Typology and Cross-lingual Transfer Learning (Yu et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.560.pdf
Video:
 https://aclanthology.org/2021.acl-long.560.mp4
Code
 DianDYu/language_embeddings
Data
XNLI