Workshop on Computational Models of Reference, Anaphora and Coreference (2018)
Anaphora Resolution with the ARRAU Corpus
We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio | Yulia Grishina | Varada Kolhatkar | Nafise Moosavi | Ina Roesiger | Adam Roussel | Fabian Simonjetz | Alexandra Uma | Olga Uryupina | Juntao Yu | Heike Zinsmeister
The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.
We present two systems for bridging resolution, which we submitted to the CRAC shared task on bridging anaphora resolution in the ARRAU corpus (track 2): a rule-based approach following Hou et al. 2014 and a learning-based approach. The re-implementation of Hou et al. 2014 achieves very poor performance when being applied to ARRAU. We found that the reasons for this lie in the different bridging annotations: whereas the rule-based system suggests many referential bridging pairs, ARRAU contains mostly lexical bridging. We describe the differences between these two types of bridging and adapt the rule-based approach to be able to handle lexical bridging. The modified rule-based approach achieves reasonable performance on all (sub)-tasks and outperforms a simple learning-based approach.
Notional anaphors are pronouns which disagree with their antecedents’ grammatical categories for notional reasons, such as plural to singular agreement in: “the government ... they”. Since such cases are rare and conflict with evidence from strictly agreeing cases (“the government ... it”), they present a substantial challenge to both coreference resolution and referring expression generation. Using the OntoNotes corpus, this paper takes an ensemble approach to predicting English notional anaphora in context on the basis of the largest empirical data to date. In addition to state of the art prediction accuracy, the results suggest that theoretical approaches positing a plural construal at the antecedent’s utterance are insufficient, and that circumstances at the anaphor’s utterance location, as well as global factors such as genre, have a strong effect on the choice of referring expression.
Cases of coreference and bridging resolution often require knowledge about semantic relations between anaphors and antecedents. We suggest state-of-the-art neural-network classifiers trained on relation benchmarks to predict and integrate likelihoods for relations. Two experiments with representations differing in noise and complexity improve our bridging but not our coreference resolver.
Bridging resolution is the task of recognising bridging anaphors and linking them to their antecedents. While there is some work on bridging resolution for English, there is only little work for German. We present two datasets which contain bridging annotations, namely DIRNDL and GRAIN, and compare the performance of a rule-based system with a simple baseline approach on these two corpora. The performance for full bridging resolution ranges between an F1 score of 13.6% for DIRNDL and 11.8% for GRAIN. An analysis using oracle lists suggests that the system could, to a certain extent, benefit from ranking and re-ranking antecedent candidates. Furthermore, we investigate the importance of single features and show that the features used in our work seem promising for future bridging resolution approaches.
This paper describes the design and evaluation of a system for the automatic detection and resolution of shell nouns in German. Shell nouns are general nouns, such as fact, question, or problem, whose full interpretation relies on a content phrase located elsewhere in a text, which these nouns simultaneously serve to characterize and encapsulate. To accomplish this, the system uses a series of lexico-syntactic patterns in order to extract shell noun candidates and their content in parallel. Each pattern has its own classifier, which makes the final decision as to whether or not a link is to be established and the shell noun resolved. Overall, about 26.2% of the annotated shell noun instances were correctly identified by the system, and of these cases, about 72.5% are assigned the correct content phrase. Though it remains difficult to identify shell noun instances reliably (recall is accordingly low in this regard), this system usually assigns the right content to correctly classified cases. cases.
We present PAWS, a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In addition, the texts are syntactically parsed and word-aligned. PAWS is based on PCEDT 2.0 and continues the tradition of multilingual treebanks with coreference annotation. The paper focuses on the coreference annotation in PAWS and its language-specific differences. PAWS offers linguistic material that can be further leveraged in cross-lingual studies, especially on coreference.
We perform a fine-grained large-scale analysis of coreference projection. By projecting gold coreference from Czech to English and vice versa on Prague Czech-English Dependency Treebank 2.0 Coref, we set an upper bound of a proposed projection approach for these two languages. We undertake a detailed thorough analysis that combines the analysis of projection’s subtasks with analysis of performance on individual mention types. The findings are accompanied with examples from the corpus.
Typological differences between English and Chinese suggest stronger reliance on salience of the antecedent during pronoun resolution in Chinese. We examined this hypothesis by correlating a difficulty measure of pronoun resolution derived by the activation-based ACT-R model with the brain activity of English and Chinese participants listening to a same audiobook during fMRI recording. The ACT-R model predicts higher overall difficulty for English speakers, which is supported at the brain level in left Broca’s area. More generally, it confirms that computational modeling approach is able to dissociate different dimensions that are involved in the complex process of pronoun resolution in the brain.
Anaphora resolution systems require both an enumeration of possible candidate antecedents and an identification process of the antecedent. This paper focuses on (i) the impact of the form of referring expression on entity-vs-event preferences and (ii) how properties of the passage interact with referential form. Two crowd-sourced story-continuation experiments were conducted, using constructed and naturally-occurring passages, to see how participants interpret It and This pronouns following a context sentence that makes available event and entity referents. Our participants show a strong, but not categorical, bias to use This to refer to events and It to refer to entities. However, these preferences vary with passage characteristics such as verb class (a proxy in our constructed examples for the number of explicit and implicit entities) and more subtle author intentions regarding subsequent re-mention (the original event-vs-entity re-mention of our corpus items).