Aligning Multilingual Embeddings for Improved Code-switched Natural Language Understanding

Barah Fazili, Preethi Jyothi


Abstract
Multilingual pretrained models, while effective on monolingual data, need additional training to work well with code-switched text. In this work, we present a novel idea of training multilingual models with alignment objectives using parallel text so as to explicitly align word representations with the same underlying semantics across languages. Such an explicit alignment step has a positive downstream effect and improves performance on multiple code-switched NLP tasks. We explore two alignment strategies and report improvements of up to 7.32%, 0.76% and 1.9% on Hindi-English Sentiment Analysis, Named Entity Recognition and Question Answering tasks compared to a competitive baseline model.
Anthology ID:
2022.coling-1.375
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4268–4273
Language:
URL:
https://aclanthology.org/2022.coling-1.375
DOI:
Bibkey:
Cite (ACL):
Barah Fazili and Preethi Jyothi. 2022. Aligning Multilingual Embeddings for Improved Code-switched Natural Language Understanding. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4268–4273, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Aligning Multilingual Embeddings for Improved Code-switched Natural Language Understanding (Fazili & Jyothi, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.375.pdf