Piotr Grzybowski


2019

pdf bib
Sparse Coding in Authorship Attribution for Polish Tweets
Piotr Grzybowski | Ewa Juralewicz | Maciej Piasecki
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The study explores application of a simple Convolutional Neural Network for the problem of authorship attribution of tweets written in Polish. In our solution we use two-step compression of tweets using Byte Pair Encoding algorithm and vectorisation as an input to the distributional model generated for the large corpus of Polish tweets by word2vec algorithm. Our method achieves results comparable to the state-of-the-art approaches for the similar task on English tweets and expresses a very good performance in the classification of Polish tweets. We tested the proposed method in relation to the number of authors and tweets per author. We also juxtaposed results for authors with different topic backgrounds against each other.