Leveraging Multilingual Resources for Language Invariant Sentiment Analysis

Allen Antony, Arghya Bhattacharya, Jaipal Goud, Radhika Mamidi


Abstract
Sentiment analysis is a widely researched NLP problem with state-of-the-art solutions capable of attaining human-like accuracies for various languages. However, these methods rely heavily on large amounts of labeled data or sentiment weighted language-specific lexical resources that are unavailable for low-resource languages. Our work attempts to tackle this data scarcity issue by introducing a neural architecture for language invariant sentiment analysis capable of leveraging various monolingual datasets for training without any kind of cross-lingual supervision. The proposed architecture attempts to learn language agnostic sentiment features via adversarial training on multiple resource-rich languages which can then be leveraged for inferring sentiment information at a sentence level on a low resource language. Our model outperforms the current state-of-the-art methods on the Multilingual Amazon Review Text Classification dataset [REF] and achieves significant performance gains over prior work on the low resource Sentiraama corpus [REF]. A detailed analysis of our research highlights the ability of our architecture to perform significantly well in the presence of minimal amounts of training data for low resource languages.
Anthology ID:
2020.eamt-1.9
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
71–79
Language:
URL:
https://aclanthology.org/2020.eamt-1.9
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.9.pdf