Marian Marchal

2022

Establishing Annotation Quality in Multi-label Annotations
Marian Marchal | Merel Scholman | Frances Yung | Vera Demberg
Proceedings of the 29th International Conference on Computational Linguistics

In many linguistic fields requiring annotated data, multiple interpretations of a single item are possible. Multi-label annotations more accurately reflect this possibility. However, allowing for multi-label annotations also affects the chance that two coders agree with each other. Calculating inter-coder agreement for multi-label datasets is therefore not trivial. In the current contribution, we evaluate different metrics for calculating agreement on multi-label annotations: agreement on the intersection of annotated labels, an augmented version of Cohen’s Kappa, and precision, recall and F1. We propose a bootstrapping method to obtain chance agreement for each measure, which allows us to obtain an adjusted agreement coefficient that is more interpretable. We demonstrate how various measures affect estimates of agreement on simulated datasets and present a case study of discourse relation annotations. We also show how the proportion of double labels, and the entropy of the label distribution, influences the measures outlined above and how a bootstrapped adjusted agreement can make agreement measures more comparable across datasets in multi-label scenarios.

pdf bib abs

The effect of domain knowledge and implicitation on discourse relation inferences
Marian Marchal | Merel Scholman | Vera Demberg
Dialogue Discourse Volume 13

Readers adopt their domain knowledge to make inferences about information that is left implicit in the text. The present research investigates the role of domain knowledge in discourse relation interpretation, as this has not been examined experimentally in previous work. We compare interpretations of experts from the field of economics and biomedical sciences in texts from within and outside of their domain of expertise. The results show that high-knowledge readers are better at inferring the correct relation interpretation compared to low-knowledge readers. This effect was stronger in relations that contained a connective in the original text than in relations that were originally implicit. The study provides insight on the impact of background knowledge on discourse relation inferencing and how readers interpret discourse relations when they lack the required domain knowledge.

2021

pdf bib abs

Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin
Marian Marchal | Merel Scholman | Vera Demberg
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Connectives and their relation senses were extracted from a parallel corpus combining automatic (PDTB end-to-end parser) and manual annotations. This resulted in Naija-Lex, a lexicon of discourse connectives in Nigerian Pidgin with English translations. The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.

Co-authors

Venues

Fix author