2018
pdf
bib
abs
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
Anna Rogers
|
Alexey Romanov
|
Anna Rumshisky
|
Svitlana Volkova
|
Mikhail Gronas
|
Alex Gribov
Proceedings of the 27th International Conference on Computational Linguistics
This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.
2017
pdf
bib
abs
Tracking Bias in News Sources Using Social Media: the Russia-Ukraine Maidan Crisis of 2013–2014
Peter Potash
|
Alexey Romanov
|
Mikhail Gronas
|
Anna Rumshisky
|
Mikhail Gronas
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.
pdf
bib
abs
Tracking Bias in News Sources Using Social Media: the Russia-Ukraine Maidan Crisis of 2013–2014
Peter Potash
|
Alexey Romanov
|
Mikhail Gronas
|
Anna Rumshisky
|
Mikhail Gronas
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.
2015
pdf
bib
Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations
Yen-Fu Luo
|
Anna Rumshisky
|
Mikhail Gronas
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)