Carita Paradis

2017

pdf bib abs
Identifying the Authors’ National Variety of English in Social Media Texts
Vasiliki Simaki | Panagiotis Simakis | Carita Paradis | Andreas Kerren
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection process. The classification accuracy achieved, when the 31 highest ranked features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

2016

pdf bib
Unshared task: (Dis)agreement in online debates
Maria Skeppstedt | Magnus Sahlgren | Carita Paradis | Andreas Kerren
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib abs
Active learning for detection of stance components
Maria Skeppstedt | Magnus Sahlgren | Carita Paradis | Andreas Kerren
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition.