Kaisla Kajava


2021

pdf bib
A COVID-19 news coverage mood map of Europe
Frankie Robertson | Jarkko Lagus | Kaisla Kajava
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

We present a COVID-19 news dashboard which visualizes sentiment in pandemic news coverage in different languages across Europe. The dashboard shows analyses for positive/neutral/negative sentiment and moral sentiment for news articles across countries and languages. First we extract news articles from news-crawl. Then we use a pre-trained multilingual BERT model for sentiment analysis of news article headlines and a dictionary and word vectors -based method for moral sentiment analysis of news articles. The resulting dashboard gives a unified overview of news events on COVID-19 news overall sentiment, and the region and language of publication from the period starting from the beginning of January 2020 to the end of January 2021.

2020

pdf bib
XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily Öhman | Marc Pàmies | Kaisla Kajava | Jörg Tiedemann
Proceedings of the 28th International Conference on Computational Linguistics

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

pdf bib
LT@Helsinki at SemEval-2020 Task 12: Multilingual or Language-specific BERT?
Marc Pàmies | Emily Öhman | Kaisla Kajava | Jörg Tiedemann
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

2018

pdf bib
Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation
Emily Öhman | Kaisla Kajava | Jörg Tiedemann | Timo Honkela
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.