Sentiment Analysis for Multilingual Corpora

Svitlana Galeshchuk, Ju Qiu, Julien Jourdan


Abstract
The paper presents a generic approach to the supervised sentiment analysis of social media content in Slavic languages. The method proposes translating the documents from the original language to English with Google’s Neural Translation Model. The resulted texts are then converted to vectors by averaging the vectorial representation of words derived from a pre-trained Word2Vec English model. Testing the approach with several machine learning methods on Polish, Slovenian and Croatian Twitter datasets returns up to 86% of classification accuracy on the out-of-sample data.
Anthology ID:
W19-3717
Volume:
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–125
Language:
URL:
https://aclanthology.org/W19-3717/
DOI:
10.18653/v1/W19-3717
Bibkey:
Cite (ACL):
Svitlana Galeshchuk, Ju Qiu, and Julien Jourdan. 2019. Sentiment Analysis for Multilingual Corpora. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 120–125, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Sentiment Analysis for Multilingual Corpora (Galeshchuk et al., BSNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3717.pdf
Code
 GSukr/Sentiment_Analysis_Multilingual_Corpora