Karolina Zaczynska

2025

pdf bib abs
Expanding the UNSC Conflicts Corpus by Incorporating Domain Expert Annotations and LLM Experiments
Karolina Zaczynska
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

In this work we expand the UN Security Council Conflicts corpus (UNSCon) (Zaczynska at al. 2024) on verbal disputes in diplomatic speeches in English.

2024

pdf bib abs
How Diplomats Dispute: The UN Security Council Conflict Corpus
Karolina Zaczynska | Peter Bourgonje | Manfred Stede
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We investigate disputes in the United Nations Security Council (UNSC) by studying the linguistic means of expressing conflicts. As a result, we present the UNSC Conflict Corpus (UNSCon), a collection of 87 UNSC speeches that are annotated for conflicts. We explain and motivate our annotation scheme and report on a series of experiments for automatic conflict classification. Further, we demonstrate the difficulty when dealing with diplomatic language - which is highly complex and often implicit along various dimensions - by providing corpus examples, readability scores, and classification results.

pdf bib abs
Rhetorical Strategies in the UN Security Council: Rhetorical Structure Theory and Conflicts
Karolina Zaczynska | Manfred Stede
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

More and more corpora are being annotated with Rhetorical Structure Theory (RST) trees, often in a multi-layer scenario, as analyzing RST annotations in combination with other layers can lead to a deeper understanding of texts. To date, prior work on RST for the analysis of diplomatic language however, is scarce. We are interested in political speeches and investigate what rhetorical strategies diplomats use to communicate critique or deal with disputes. To this end, we present a new dataset with RST annotations of 82 diplomatic speeches aligned to existing Conflict annotations (UNSC-RST). We explore ways of using rhetorical trees to analyze an annotated multi-layer corpus, looking at both the relation distribution and the tree structure of speeches. In preliminary analyses we already see patterns that are characteristic for particular topics or countries.

2023

pdf bib
The UNSC-Graph: An Extensible Knowledge Graph for the UNSC Corpus
Stian Rødven-Eide | Karolina Zaczynska | Antonio Pires | Ronny Patz | Manfred Stede
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

pdf bib
Toward a Multilingual Connective Database: Aligning German/French Concessive Connectives
Sophia Rauh | Karolina Zaczynska | Peter Bourgonje
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

2022

We present an extension of the SynSemClass Event-type Ontology, originally conceived as a bilingual Czech-English resource. We added German entries to the classes representing the concepts of the ontology. Having a different starting point than the original work (unannotated parallel corpus without links to a valency lexicon and, of course, different existing lexical resources), it was a challenge to adapt the annotation guidelines, the data model and the tools used for the original version. We describe the process and results of working in such a setup. We also show the next steps to adapt the annotation process, data structures and formats and tools necessary to make the addition of a new language in the future more smooth and efficient, and possibly to allow for various teams to work on SynSemClass extensions to many languages concurrently. We also present the latest release which contains the results of adding German, freely available for download as well as for online access.

2021

pdf bib
Extraction and Normalization of Vague Time Expressions in German
Ulrike May | Karolina Zaczynska | Julián Moreno-Schneider | Georg Rehm
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

pdf bib abs
Fine-grained Classification of Political Bias in German News: A Data Set and Initial Experiments
Dmitrii Aksenov | Peter Bourgonje | Karolina Zaczynska | Malte Ostendorff | Julian Moreno-Schneider | Georg Rehm
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We present a data set consisting of German news articles labeled for political bias on a five-point scale in a semi-supervised way. While earlier work on hyperpartisan news detection uses binary classification (i.e., hyperpartisan or not) and English data, we argue for a more fine-grained classification, covering the full political spectrum (i.e., far-left, left, centre, right, far-right) and for extending research to German data. Understanding political bias helps in accurately detecting hate speech and online abuse. We experiment with different classification methods for political bias detection. Their comparatively low performance (a macro-F1 of 43 for our best setup, compared to a macro-F1 of 79 for the binary classification task) underlines the need for more (balanced) data annotated in a fine-grained way.