Predicting Sentiment of Polish Language Short Texts

Aleksander Wawer, Julita Sobiczewska


Abstract
The goal of this paper is to use all available Polish language data sets to seek the best possible performance in supervised sentiment analysis of short texts. We use text collections with labelled sentiment such as tweets, movie reviews and a sentiment treebank, in three comparison modes. In the first, we examine the performance of models trained and tested on the same text collection using standard cross-validation (in-domain). In the second we train models on all available data except the given test collection, which we use for testing (one vs rest cross-domain). In the third, we train a model on one data set and apply it to another one (one vs one cross-domain). We compare wide range of methods including machine learning on bag-of-words representation, bidirectional recurrent neural networks as well as the most recent pre-trained architectures ELMO and BERT. We formulate conclusions as to cross-domain and in-domain performance of each method. Unsurprisingly, BERT turned out to be a strong performer, especially in the cross-domain setting. What is surprising however, is solid performance of the relatively simple multinomial Naive Bayes classifier, which performed equally well as BERT on several data sets.
Anthology ID:
R19-1151
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1321–1327
Language:
URL:
https://aclanthology.org/R19-1151/
DOI:
10.26615/978-954-452-056-4_151
Bibkey:
Cite (ACL):
Aleksander Wawer and Julita Sobiczewska. 2019. Predicting Sentiment of Polish Language Short Texts. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1321–1327, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Predicting Sentiment of Polish Language Short Texts (Wawer & Sobiczewska, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1151.pdf