Pihla Toivanen

2025

Insights into developing analytical categorization schemes: three problem types related to annotation agreement
Pihla Toivanen | Eetu Mäkelä | Antti Kanner
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Coding themes, frames, opinions and other attributes are widely used in the social sciences and doing that is also a base for building supervised text classifiers. Coding content needs a lot of resources, and lately this process has been utilized particularly in the training set annotation for machine learning models. Although the objectivity of coding is not always the purpose of coding, it helps in building the machine learning model, if the codings are uniformly done. Usually machine learning models are built by first defining annotation scheme, which contains definitions of categories and instructions for coding. It is known that multiple aspects affect to the annotation results, such as, the domain of annotation, number of annotators, and number of categories in annotation. In this article, we present few more problems that we show to be related with the annotation results in our case study. Those are negated presence of a category, low proportional presence of relevant content and implicit presence of a category. These problems should be resolved in all schemes on the level of scheme definition. To extract our problem categories, we focus on a media research case of extensive data on both the process as well as the results.

pdf bib abs

Implicit and Indirect: Detecting Face-threatening and Paired Actions in Asynchronous Online Conversations
Henna Paakki | Pihla Toivanen | Kaisla Kajava
Northern European Journal of Language Technology, Volume 11

This paper presents an approach to computationally detecting face-threatening and paired actions in asynchronous online conversations. Action detection has been widely studied for synchronous chats. However, there are fewer models or datasets for asynchronous conversations, and they have not included some of the face-threatening actions central to online conversations involving misbehavior like trolling. We examine asynchronous crisis news related online conversations in Finnish, providing an annotation scheme for identifying central actions used in this conversational context. An important contribution is to include face-threatening actions in the scheme, and training computational classifiers for their detection with improved performance compared to prior work. We illustrate that face-threatening actions are important for analyzing conversations related to crisis news. We show that for computational action detection, it is essential to be able to represent how multiple actions may be performed within one comment, and how ambiguity in the expression of actions often leads to multiple possible label interpretations. Annotating actions using scores helps to reflect these characteristics. We also find that an ensemble of models trained on individual annotators’ annotations can best represent multiple potential interpretations of action labels. These are especially relevant for face-threatening actions.

Co-authors

Venues

Fix author