Do machines dream of artificial agreement?
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
In this paper the (assumed) inconsistency between F1-scores and annotator agreement measures is discussed. This is exemplified in five corpora from the field of argumentation mining. High agreement is important in most annotation tasks and also often deemed important for an annotated dataset to be useful for machine learning. However, depending on the annotation task, achieving high agreement is not always easy. This is especially true in the field of argumentation mining, because argumentation can be complex as well as implicit. There are also many different models of argumentation, which can be seen in the increasing number of argumentation annotated corpora. Many of these reach moderate agreement but are still used in machine learning tasks, reaching high F1-score. In this paper we describe five corpora, in particular how they have been created and used, to see how they have handled disagreement. We find that agreement can be raised post-production, but that more discussion regarding evaluating and calculating agreement is needed. We conclude that standardisation of the models and the evaluation methods could help such discussions.
Annotating argumentation in Swedish social media
Proceedings of the 7th Workshop on Argument Mining
This paper presents a small study of annotating argumentation in Swedish social media. Annotators were asked to annotate spans of argumentation in 9 threads from two discussion forums. At the post level, Cohen’s k and Krippendorff’s alpha 0.48 was achieved. When manually inspecting the annotations the annotators seemed to agree when conditions in the guidelines were explicitly met, but implicit argumentation and opinions, resulting in annotators having to interpret what’s missing in the text, caused disagreements.
Towards Assessing Argumentation Annotation - A First Step
Anna Lindahl | Lars Borin | Jacobo Rouces
Proceedings of the 6th Workshop on Argument Mining
This paper presents a first attempt at using Walton’s argumentation schemes for annotating arguments in Swedish political text and assessing the feasibility of using this particular set of schemes with two linguistically trained annotators. The texts are not pre-annotated with argumentation structure beforehand. The results show that the annotators differ both in number of annotated arguments and selection of the conclusion and premises which make up the arguments. They also differ in their labeling of the schemes, but grouping the schemes increases their agreement. The outcome from this will be used to develop guidelines for future annotations.