Merel C. J. Scholman
Also published as: Merel C.J. Scholman
2023
Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design
Valentina Pyatkin | Frances Yung | Merel C. J. Scholman | Reut Tsarfaty | Ido Dagan | Vera Demberg
Transactions of the Association for Computational Linguistics, Volume 11
Valentina Pyatkin | Frances Yung | Merel C. J. Scholman | Reut Tsarfaty | Ido Dagan | Vera Demberg
Transactions of the Association for Computational Linguistics, Volume 11
Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.
2021
Is there less annotator agreement when the discourse relation is underspecified?
Jet Hoek | Merel C.J. Scholman | Ted J.M. Sanders
Proceedings of the First Workshop on Integrating Perspectives on Discourse Annotation
Jet Hoek | Merel C.J. Scholman | Ted J.M. Sanders
Proceedings of the First Workshop on Integrating Perspectives on Discourse Annotation
2019
How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations
Vera Demberg | Merel C.J. Scholman | Fatemeh Torabi Asr
Dialogue Discourse Volume 10
Vera Demberg | Merel C.J. Scholman | Fatemeh Torabi Asr
Dialogue Discourse Volume 10
Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes joint usage of the annotations difficult, preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same texts, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labelling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences for future annotation and for usage of the existing resources.
2017
On Temporality in Discourse Annotation: Theoretical and Practical Considerations
Jacqueline Evers-Vermeul | Jet Hoek | Merel C.J. Scholman
Dialogue Discourse Volume 8
Jacqueline Evers-Vermeul | Jet Hoek | Merel C.J. Scholman
Dialogue Discourse Volume 8
Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.
Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations
Merel C.J. Scholman | Vera Demberg
Dialogue Discourse Volume 8
Merel C.J. Scholman | Vera Demberg
Dialogue Discourse Volume 8
Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate/specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.
2016
Categories of coherence relations in discourse annotation
Merel C.J. Scholman | Jacqueline Evers-Vermeul | Ted J.M. Sanders
Dialogue Discourse Volume 7
Merel C.J. Scholman | Jacqueline Evers-Vermeul | Ted J.M. Sanders
Dialogue Discourse Volume 7
Over the last decennia, annotating discourse coherence relations has gained increasing interest of the linguistics research community. Because of the complexity of coherence relations, there is no agreement on an annotation standard. Current annotation methods often lack a systematic order of coherence relations. In this article, we investigate the usability of the cognitive approach to coherence relations, developed by Sanders et al. (1992, 1993), for discourse annotation. The theory proposes a taxonomy of coherence relations in terms of four cognitive primitives. In this paper, we first develop a systematic, step-wise annotation process. The reliability of this annotation scheme is then tested in an annotation experiment with non-trained, non-expert annotators. An implicit and explicit version of the annotation instruction was created to determine whether the type of instruction influences the annotator agreement. The results show that two of the four primitives, polarity and order of the segments, can be applied reliably by non-trained annotators. The other two primitives, basic operation and source of coherence, are more problematic. Participants using the explicit instruction show higher agreement on the primitives than participants used the implicit instruction. These results are comparable to agreement statistics of other discourse corpora annotated by trained, expert annotators. Given that non-trained, non-expert annotators show similar amounts of agreement, these results indicate that the cognitive approach to coherence relations is a promising method for annotating discourse.