English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network

Ying Hao Lim, Jasy Suet Yan Liew


Abstract
The main challenge in English-Malay cross-lingual emotion classification is that there are no Malay training emotion corpora. Given that machine translation could fall short in contextually complex tweets, we only limited machine translation to the word level. In this paper, we bridge the language gap between English and Malay through cross-lingual word embeddings constructed using singular value decomposition. We pre-trained our hierarchical attention model using English tweets and fine-tuned it using a set of gold standard Malay tweets. Our model uses significantly less computational resources compared to the language models. Experimental results show that the performance of our model is better than mBERT in zero-shot learning by 2.4% and Malay BERT by 0.8% when a limited number of Malay tweets is available. In exchange for 6 – 7 times less in computational time, our model only lags behind mBERT and XLM-RoBERTa by a margin of 0.9 – 4.3 % in few-shot learning. Also, the word-level attention could be transferred to the Malay tweets accurately using the cross-lingual word embeddings.
Anthology ID:
2022.wassa-1.12
Volume:
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Jeremy Barnes, Orphée De Clercq, Valentin Barriere, Shabnam Tafreshi, Sawsan Alqahtani, João Sedoc, Roman Klinger, Alexandra Balahur
Venue:
WASSA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–124
Language:
URL:
https://aclanthology.org/2022.wassa-1.12
DOI:
10.18653/v1/2022.wassa-1.12
Bibkey:
Cite (ACL):
Ying Hao Lim and Jasy Suet Yan Liew. 2022. English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network. In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, pages 113–124, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
English-Malay Word Embeddings Alignment for Cross-lingual Emotion Classification with Hierarchical Attention Network (Lim & Liew, WASSA 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wassa-1.12.pdf
Video:
 https://aclanthology.org/2022.wassa-1.12.mp4