Workshop on Computational Models of Reference, Anaphora and Coreference (2021)


bib (full) Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Massimo Poesio | Yulia Grishina | Vincent Ng

pdf bib
A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution in English
Hongming Zhang | Xinran Zhao | Yangqiu Song

Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. Compared with the general coreference resolution task, the main challenge of PCR is the coreference relation prediction rather than the mention detection. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models, which motivates us to survey existing approaches and think about how to do better. In this survey, we first introduce representative datasets and models for the ordinary pronoun coreference resolution task. Then we focus on recent progress on hard pronoun coreference resolution problems (e.g., Winograd Schema Challenge) to analyze how well current models can understand commonsense. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications (e.g., all SOTA models struggle on correctly resolving pronouns to infrequent objects). All experiment codes will be available upon acceptance.

pdf bib
Coreference Resolution for the Biomedical Domain: A Survey
Pengcheng Lu | Massimo Poesio

Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature. Thus, the biomedical genre has long been the second most researched genre for coreference resolution after the news domain, and the subject of a great deal of research for NLP in general. In recent years this interest has grown enormously leading to the development of a number of substantial datasets, of domain-specific contextual language models, and of several architectures. In this paper we review the state of-the-art of coreference in the biomedical domain with a particular attention on these most recent developments.

pdf bib
FantasyCoref: Coreference Resolution on Fantasy Literature Through Omniscient Writer’s Point of View
Sooyoun Han | Sumin Seo | Minji Kang | Jongin Kim | Nayoung Choi | Min Song | Jinho D. Choi

This paper presents a new corpus and annotation guideline for a novel coreference resolution task on fictional texts, and analyzes its unique characteristics. FantasyCoref contains 211 stories of Grimms’ Fairy Tales and 3 other fantasy literature annotated in the omniscient writer’s point of view (OWV) to handle distinctive aspects in this genre. This task is more challenging than general coreference resolution in two ways. First, documents in our corpus are 2.5 times longer than the ones in OntoNotes, raising a new layer of difficulty in resolving long-distant referents. Second, annotation of literary styles and concepts raise several issues which are not sufficiently addressed in the existing annotation guidelines. Hence, considerations on such issues and the concept of OWV are necessary to achieve high inter-annotator agreement (IAA) in coreference resolution of fictional texts. We carefully conduct annotation tasks in four stages to ensure the quality of our annotation. As a result, a high IAA score of 87% is achieved using the standard coreference evaluation metric. Finally, state-of-the-art coreference resolution approaches are evaluated on our corpus. After training with our annotated dataset, there was a 2.59% and 3.06% improvement over the model trained on the OntoNotes dataset. Also, we observe that the portion of errors specific to fictional texts declines after the training.

pdf bib
DramaCoref: A Hybrid Coreference Resolution System for German Theater Plays
Janis Pagel | Nils Reiter

We present a system for resolving coreference on theater plays, DramaCoref. The system uses neural network techniques to provide a list of potential mentions. These mentions are assigned to common entities using generic and domain-specific rules. We find that DramaCoref works well on the theater plays when compared to corpora from other domains and profits from the inclusion of information specific to theater plays. On the best-performing setup, it achieves a CoNLL score of 32% when using automatically detected mentions and 55% when using gold mentions. Single rules achieve high precision scores; however, rules designed on other domains are often not applicable or yield unsatisfactory results. Error analysis shows that the mention detection is the main weakness of the system, providing directions for future improvements.

pdf bib
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature
Andreas van Cranenburgh | Esther Ploeger | Frank van den Berg | Remi Thüss

We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution. The classifiers substantially increase coreference performance in our experiments with Dutch literature across all metrics on the development set: mention detection, LEA, CoNLL, and especially pronoun accuracy. However, on the test set, the best results are obtained with rule-based pronoun resolution. This inconsistent result highlights that the rule-based system is still a strong baseline, and more work is needed to improve pronoun resolution robustly for this dataset. While end-to-end neural systems require no feature engineering and achieve excellent performance in standard benchmarks with large training sets, our simple hybrid system scales well to long document coreference (>10k words) and attains superior results in our experiments on literature.

pdf bib
Lazy Low-Resource Coreference Resolution: a Study on Leveraging Black-Box Translation Tools
Semere Kiros Bitew | Johannes Deleu | Chris Develder | Thomas Demeester

Large annotated corpora for coreference resolution are available for few languages. For machine translation, however, strong black-box systems exist for many languages. We empirically explore the appealing idea of leveraging such translation tools for bootstrapping coreference resolution in languages with limited resources. Two scenarios are analyzed, in which a large coreference corpus in a high-resource language is used for coreference predictions in a smaller language, i.e., by machine translating either the training corpus or the test data. In our empirical evaluation of coreference resolution using the two scenarios on several medium-resource languages, we find no improvement over monolingual baseline models. Our analysis of the various sources of error inherent to the studied scenarios, reveals that in fact the quality of contemporary machine translation tools is the main limiting factor.

pdf bib
Resources and Evaluations for Danish Entity Resolution
Maria Barrett | Hieu Lam | Martin Wu | Ophélie Lacroix | Barbara Plank | Anders Søgaard

Automatic coreference resolution is understudied in Danish even though most of the Danish Dependency Treebank (Buch-Kromann, 2003) is annotated with coreference relations. This paper describes a conversion of its partial, yet well-documented, coreference relations into coreference clusters and the training and evaluation of coreference models on this data. To the best of our knowledge, these are the first publicly available, neural coreference models for Danish. We also present a new entity linking annotation on the dataset using WikiData identifiers, a named entity disambiguation (NED) dataset, and a larger automatically created NED dataset enabling wikily supervised NED models. The entity linking annotation is benchmarked using a state-of-the-art neural entity disambiguation model.

pdf bib
CoreLM: Coreference-aware Language Model Fine-Tuning
Nikolaos Stylianou | Ioannis Vlahavas

Language Models are the underpin of all modern Natural Language Processing (NLP) tasks. The introduction of the Transformers architecture has contributed significantly into making Language Modeling very effective across many NLP task, leading to significant advancements in the field. However, Transformers come with a big computational cost, which grows quadratically with respect to the input length. This presents a challenge as to understand long texts requires a lot of context. In this paper, we propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models so that they incorporate explicit entity information. By introducing entity representations, we make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost. We implement our approach using GPT2 and compare the fine-tuned model to the original. Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes. We also compare the models’ performance in terms of Accuracy in LAMBADA and Children’s Book Test, with and without the use of model-created coreference annotations.

pdf bib
Data Augmentation Methods for Anaphoric Zero Pronouns
Abdulrahman Aloraini | Massimo Poesio

In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and many others, unrealized (null) arguments in certain syntactic positions can refer to a previously introduced entity, and are thus called anaphoric zero pronouns. The existing resources for studying anaphoric zero pronoun interpretation are however still limited. In this paper, we use five data augmentation methods to generate and detect anaphoric zero pronouns automatically. We use the augmented data as additional training materials for two anaphoric zero pronoun systems for Arabic. Our experimental results show that data augmentation improves the performance of the two systems, surpassing the state-of-the-art results.

pdf bib
Exploring Pre-Trained Transformers and Bilingual Transfer Learning for Arabic Coreference Resolution
Bonan Min

In this paper, we develop bilingual transfer learning approaches to improve Arabic coreference resolution by leveraging additional English annotation via bilingual or multilingual pre-trained transformers. We show that bilingual transfer learning improves the strong transformer-based neural coreference models by 2-4 F1. We also systemically investigate the effectiveness of several pre-trained transformer models that differ in training corpora, languages covered, and model capacity. Our best model achieves a new state-of-the-art performance of 64.55 F1 on the Arabic OntoNotes dataset. Our code is publicly available at

pdf bib
Event and Entity Coreference using Trees to Encode Uncertainty in Joint Decisions
Nishant Yadav | Nicholas Monath | Rico Angell | Andrew McCallum

Coreference decisions among event mentions and among co-occurring entity mentions are highly interdependent, thus motivating joint inference. Capturing the uncertainty over each variable can be crucial for inference among multiple dependent variables. Previous work on joint coreference employs heuristic approaches, lacking well-defined objectives, and lacking modeling of uncertainty on each side of the joint problem. We present a new approach of joint coreference, including (1) a formal cost function inspired by Dasgupta’s cost for hierarchical clustering, and (2) a representation for uncertainty of clustering of event and entity mentions, again based on a hierarchical structure. We describe an alternating optimization method for inference that when clustering event mentions, considers the uncertainty of the clustering of entity mentions and vice-versa. We show that our proposed joint model provides empirical advantages over state-of-the-art independent and joint models.

pdf bib
On Generalization in Coreference Resolution
Shubham Toshniwal | Patrick Xia | Sam Wiseman | Karen Livescu | Kevin Gimpel

While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.

pdf bib
Improving Span Representation for Domain-adapted Coreference Resolution
Nupoor Gandhi | Anjalie Field | Yulia Tsvetkov

Recent work has shown fine-tuning neural coreference models can produce strong performance when adapting to different domains. However, at the same time, this can require a large amount of annotated target examples. In this work, we focus on supervised domain adaptation for clinical notes, proposing the use of concept knowledge to more efficiently adapt coreference models to a new domain. We develop methods to improve the span representations via (1) a retrofitting loss to incentivize span representations to satisfy a knowledge-based distance function and (2) a scaffolding loss to guide the recovery of knowledge from the span representation. By integrating these losses, our model is able to improve our baseline precision and F-1 score. In particular, we show that incorporating knowledge with end-to-end coreference models results in better performance on the most challenging, domain-specific spans.

pdf bib
Coreference by Appearance: Visually Grounded Event Coreference Resolution
Liming Wang | Shengyu Feng | Xudong Lin | Manling Li | Heng Ji | Shih-Fu Chang

Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc. However, the events and entities depicting in different modalities may not be perfectly aligned and can be difficult to annotate, which makes the task especially challenging with little supervision available. To address the above issues, we propose a supervised model based on attention mechanism and an unsupervised model based on statistical machine translation, capable of learning the relative importance of modalities for event coreference resolution. Experiments on a video multimedia event dataset show that our multimodal models outperform text-only systems in event coreference resolution tasks. A careful analysis reveals that the performance gain of the multimodal model especially under unsupervised settings comes from better learning of visually salient events.

pdf bib
Anatomy of OntoGUMAdapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
Yilun Zhu | Sameer Pradhan | Amir Zeldes

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark. However lack of comparable data following the same scheme for more genres makes it difficult to evaluate generalizability to open domain data. Zhu et al. (2021) introduced the creation of the OntoGUM corpus for evaluating geralizability of the latest neural LM-based end-to-end systems. This paper covers details of the mapping process which is a set of deterministic rules applied to the rich syntactic and discourse annotations manually annotated in the GUM corpus. Out-of-domain evaluation across 12 genres shows nearly 15-20% degradation for both deterministic and deep learning systems, indicating a lack of generalizability or covert overfitting in existing coreference resolution models.

pdf bib
Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution
Zhaofeng Wu | Matt Gardner

Despite significant recent progress in coreference resolution, the quality of current state-of-the-art systems still considerably trails behind human-level performance. Using the CoNLL-2012 and PreCo datasets, we dissect the best instantiation of the mainstream end-to-end coreference resolution model that underlies most current best-performing coreference systems, and empirically analyze the behavior of its two components: mention detector and mention linker. While the detector traditionally focuses heavily on recall as a design decision, we demonstrate the importance of precision, calling for their balance. However, we point out the difficulty in building a precise detector due to its inability to make important anaphoricity decisions. We also highlight the enormous room for improving the linker and show that the rest of its errors mainly involve pronoun resolution. We propose promising next steps and hope our findings will help future research in coreference resolution.


bib (full) Proceedings of the 2nd Workshop on Computational Approaches to Discourse

pdf bib
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube | Amir Zeldes

pdf bib
I’ll be there for you”: The One with Understanding Indirect Answers
Cathrine Damgaard | Paulina Toborek | Trine Eriksen | Barbara Plank

Indirect answers are replies to polar questions without the direct use of word cues such as ‘yes’ and ‘no’. Humans are very good at understanding indirect answers, such as ‘I gotta go home sometime’, when asked ‘You wanna crash on the couch?’. Understanding indirect answers is a challenging problem for dialogue systems. In this paper, we introduce a new English corpus to study the problem of understanding indirect answers. Instead of crowd-sourcing both polar questions and answers, we collect questions and indirect answers from transcripts of a prominent TV series and manually annotate them for answer type. The resulting dataset contains 5,930 question-answer pairs. We release both aggregated and raw human annotations. We present a set of experiments in which we evaluate Convolutional Neural Networks (CNNs) for this task, including a cross-dataset evaluation and experiments with learning from disagreements in annotation. Our results show that the task of interpreting indirect answers remains challenging, yet we obtain encouraging improvements when explicitly modeling human disagreement.

pdf bib
Developing Conversational Data and Detection of Conversational Humor in Telugu
Vaishnavi Pamulapati | Radhika Mamidi

In the field of humor research, there has been a recent surge of interest in the sub-domain of Conversational Humor (CH). This study has two main objectives. (a) develop a conversational (humorous and non-humorous) dataset in Telugu. (b) detect CH in the compiled dataset. In this paper, the challenges faced while collecting the data and experiments carried out are elucidated. Transfer learning and non-transfer learning techniques are implemented by utilizing pre-trained models such as FastText word embeddings, BERT language models and Text GCN, which learns the word and document embeddings simultaneously of the corpus given. State-of-the-art results are observed with a 99.3% accuracy and a 98.5% f1 score achieved by BERT.

pdf bib
Investigating non lexical markers of the language of schizophrenia in spontaneous conversations
Chuyuan Li | Maxime Amblard | Chloé Braud | Caroline Demily | Nicolas Franck | Michel Musiol

We investigate linguistic markers associated with schizophrenia in clinical conversations by detecting predictive features among French-speaking patients. Dealing with human-human dialogues makes for a realistic situation, but it calls for strategies to represent the context and face data sparsity. We compare different approaches for data representation – from individual speech turns to entire conversations –, and data modeling, using lexical, morphological, syntactic, and discourse features, dimensions presumed to be tightly connected to the language of schizophrenia. Previous English models were mostly lexical and reached high performance, here replicated (93.7% acc.). However, our analysis reveals that these models are heavily biased, which probably concerns most datasets on this task. Our new delexicalized models are more general and robust, with the best accuracy score at 77.9%.

pdf bib
Discourse-Driven Integrated Dialogue Development Environment for Open-Domain Dialogue Systems
Denis Kuznetsov | Dmitry Evseev | Lidia Ostyakova | Oleg Serikov | Daniel Kornev | Mikhail Burtsev

Development environments for spoken dialogue systems are popular today because they enable rapid creation of the dialogue systems in times when usage of the voice AI Assistants is constantly growing. We describe a graphical Discourse-Driven Integrated Dialogue Development Environment (DD-IDDE) for spoken open-domain dialogue systems. The DD-IDDE allows dialogue architects to interactively define dialogue flows of their skills/chatbots with the aid of the discourse-driven recommendation system, enhance these flows in the Python-based DSL, deploy, and then further improve based on the skills/chatbots usage statistics. We show how these skills/chatbots can be specified through a graphical user interface within the VS Code Extension, and then run on top of the Dialog Flow Framework (DFF). An earlier version of this framework has been adopted in one of the Alexa Prize 4 socialbots while the updated version was specifically designed to power the described DD-IDDE solution.

pdf bib
Coreference Chains Categorization by Sequence Clustering
Silvia Federzoni | Lydia-Mai Ho-Dac | Cécile Fabre

The diversity of coreference chains is usually tackled by means of global features (length, types and number of referring expressions, distance between them, etc.). In this paper, we propose a novel approach that provides a description of their composition in terms of sequences of expressions. To this end, we apply sequence analysis techniques to bring out the various strategies for introducing a referent and keeping it active throughout discourse. We discuss a first application of this method to a French written corpus annotated with coreference chains. We obtain clusters that are linguistically coherent and interpretable in terms of reference strategies and we demonstrate the influence of text genre and semantic type of the referent on chain composition.

pdf bib
Resolving Implicit References in Instructional Texts
Talita Anthonio | Michael Roth

The usage of (co-)referring expressions in discourse contributes to the coherence of a text. However, text comprehension can be difficult when referring expressions are non-verbalized and have to be resolved in the discourse context. In this paper, we propose a novel dataset of such implicit references, which we automatically derive from insertions of references in collaboratively edited how-to guides. Our dataset consists of 6,014 instances, making it one of the largest datasets of implicit references and a useful starting point to investigate misunderstandings caused by underspecified language. We test different methods for resolving implicit references in our dataset based on the Generative Pre-trained Transformer model (GPT) and compare them to heuristic baselines. Our experiments indicate that GPT can accurately resolve the majority of implicit references in our data. Finally, we investigate remaining errors and examine human preferences regarding different resolutions of an implicit reference given the discourse context.

pdf bib
A practical perspective on connective generation
Frances Yung | Merel Scholman | Vera Demberg

In data-driven natural language generation, we typically know what relation should be expressed and need to select a connective to lexicalize it. In the current contribution, we analyse whether a sophisticated connective generation module is necessary to select a connective, or whether this can be solved with simple methods (such as random choice between connectives that are known to express a given relation, or usage of a generic language model). Comparing these methods to the distributions of connective choices from a human connective insertion task, we find mixed results: for some relations, it is acceptable to lexicalize them using any of the connectives that mark this relation. However, for other relations (temporals, concessives) either a more detailed relation distinction needs to be introduced, or a more sophisticated connective choice module would be necessary.

pdf bib
Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin
Marian Marchal | Merel Scholman | Vera Demberg

Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Connectives and their relation senses were extracted from a parallel corpus combining automatic (PDTB end-to-end parser) and manual annotations. This resulted in Naija-Lex, a lexicon of discourse connectives in Nigerian Pidgin with English translations. The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.

pdf bib
Comparison of methods for explicit discourse connective identification across various domains
Merel Scholman | Tianai Dong | Frances Yung | Vera Demberg

Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles. We here assess the performance on explicit connective identification of three parse methods (PDTB e2e, Lin et al., 2014; the winner of CONLL2015, Wang et al., 2015; and DisSent, Nie et al., 2019), along with a simple heuristic. We also examine how well these systems generalize to different datasets, namely written newspaper text (PDTB), written scientific text (BioDRB), prepared spoken text (TED-MDB) and spontaneous spoken text (Disco-SPICE). The results show that the e2e parser outperforms the other parse methods in all datasets. However, performance drops significantly from the PDTB to all other datasets. We provide a more fine-grained analysis of domain differences and connectives that prove difficult to parse, in order to highlight the areas where gains can be made.

pdf bib
Revisiting Shallow Discourse Parsing in the PDTB-3: Handling Intra-sentential Implicits
Zheng Zhao | Bonnie Webber

In the PDTB-3, several thousand implicit discourse relations were newly annotated within individual sentences, adding to the over 15,000 implicit relations annotated across adjacent sentences in the PDTB-2. Given that the position of the arguments to these intra-sentential implicits is no longer as well-defined as with inter-sentential implicits, a discourse parser must identify both their location and their sense. That is the focus of the current work. The paper provides a comprehensive analysis of our results, showcasing model performance under different scenarios, pointing out limitations and noting future directions.

pdf bib
Improving Multi-Party Dialogue Discourse Parsing via Domain Integration
Zhengyuan Liu | Nancy Chen

While multi-party conversations are often less structured than monologues and documents, they are implicitly organized by semantic level correlations across the interactive turns, and dialogue discourse analysis can be applied to predict the dependency structure and relations between the elementary discourse units, and provide feature-rich structural information for downstream tasks. However, the existing corpora with dialogue discourse annotation are collected from specific domains with limited sample sizes, rendering the performance of data-driven approaches poor on incoming dialogues without any domain adaptation. In this paper, we first introduce a Transformer-based parser, and assess its cross-domain performance. We next adopt three methods to gain domain integration from both data and language modeling perspectives to improve the generalization capability. Empirical results show that the neural parser can benefit from our proposed methods, and performs better on cross-domain dialogue samples.

pdf bib
discopy: A Neural System for Shallow Discourse Parsing
René Knaebel

This paper demonstrates discopy, a novel framework that makes it easy to design components for end-to-end shallow discourse parsing. For the purpose of demonstration, we implement recent neural approaches and integrate contextualized word embeddings to predict explicit and non-explicit discourse relations. Our proposed neural feature-free system performs competitively to systems presented at the latest Shared Task on Shallow Discourse Parsing. Finally, a web front end is shown that simplifies the inspection of annotated documents. The source code, documentation, and pretrained models are publicly accessible.

pdf bib
Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces
Ekaterina Lapshinova-Koltunski | Heike Przybyl | Yuri Bizzoni

In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to-translation and source-to-interpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.

pdf bib
Capturing document context inside sentence-level neural machine translation models with self-training
Elman Mansimov | Gábor Melis | Lei Yu

Neural machine translation (NMT) has arguably achieved human level parity when trained and evaluated at the sentence-level. Document-level neural machine translation has received less attention and lags behind its sentence-level counterpart. The majority of the proposed document-level approaches investigate ways of conditioning the model on several source or target sentences to capture document context. These approaches require training a specialized NMT model from scratch on parallel document-level corpora. We propose an approach that doesn’t require training a specialized model on parallel document-level corpora and is applied to a trained sentence-level NMT model at decoding time. We process the document from left to right multiple times and self-train the sentence-level model on pairs of source sentences and generated translations. Our approach reinforces the choices made by the model, thus making it more likely that the same choices will be made in other sentences in the document. We evaluate our approach on three document-level datasets: NIST Chinese-English, WMT19 Chinese-English and OpenSubtitles English-Russian. We demonstrate that our approach has higher BLEU score and higher human preference than the baseline. Qualitative analysis of our approach shows that choices made by model are consistent across the document.

pdf bib
DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing
Zhengyuan Liu | Ke Shi | Nancy Chen

Text discourse parsing weighs importantly in understanding information flow and argumentative structure in natural language, making it beneficial for downstream tasks. While previous work significantly improves the performance of RST discourse parsing, they are not readily applicable to practical use cases: (1) EDU segmentation is not integrated into most existing tree parsing frameworks, thus it is not straightforward to apply such models on newly-coming data. (2) Most parsers cannot be used in multilingual scenarios, because they are developed only in English. (3) Parsers trained from single-domain treebanks do not generalize well on out-of-domain inputs. In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. Moreover, we propose a cross-translation augmentation strategy to enable the framework to support multilingual parsing and improve its domain generality. Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.

pdf bib
Visualizing Cross‐Lingual Discourse Relations in Multilingual TED Corpora
Zae Myung Kim | Vassilina Nikoulina | Dongyeop Kang | Didier Schwab | Laurent Besacier

This paper presents an interactive data dashboard that provides users with an overview of the preservation of discourse relations among 28 language pairs. We display a graph network depicting the cross-lingual discourse relations between a pair of languages for multilingual TED talks and provide a search function to look for sentences with specific keywords or relation types, facilitating ease of analysis on the cross-lingual discourse relations.


bib (full) Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

pdf bib
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé

pdf bib
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Juntao Yu | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé

In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in different genres of conversations. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple subtasks focusing individually on these key relations. We discuss the evaluation scripts used to assess the system performance on these subtasks, and provide a brief summary of the participating systems and the results obtained across ?? runs from 5 teams, with most submissions achieving significantly better results than our baseline methods.

pdf bib
Neural Anaphora Resolution in Dialogue
Hideo Kobayashi | Shengjie Li | Vincent Ng

We describe the systems that we developed for the three tracks of the CODI-CRAC 2021 shared task, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. Our team ranked second for entity coreference resolution, first for bridging resolution, and first for discourse deixis resolution.

pdf bib
Anaphora Resolution in Dialogue: Description of the DFKI-TalkingRobots System for the CODI-CRAC 2021 Shared-Task
Tatiana Anikina | Cennet Oguz | Natalia Skachkova | Siyu Tao | Sharmila Upadhyaya | Ivana Kruijff-Korbayova

We describe the system developed by the DFKI-TalkingRobots Team for the CODI-CRAC 2021 Shared-Task on anaphora resolution in dialogue. Our system consists of three subsystems: (1) the Workspace Coreference System (WCS) incrementally clusters mentions using semantic similarity based on embeddings combined with lexical feature heuristics; (2) the Mention-to-Mention (M2M) coreference resolution system pairs same entity mentions; (3) the Discourse Deixis Resolution (DDR) system employs a Siamese Network to detect discourse anaphor-antecedent pairs. WCS achieved F1-score of 55.6% averaged across the evaluation test sets, M2M achieved 57.2% and DDR achieved 21.5%.

pdf bib
The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference
Hongjin Kim | Damrin Kim | Harksoo Kim

The objective of anaphora resolution in dialogue shared-task is to go above and beyond the simple cases of coreference resolution in written text on which NLP has mostly focused so far, which arguably overestimate the performance of current SOTA models. The anaphora resolution in dialogue shared-task consists of three subtasks; subtask1, resolution of anaphoric identity and non-referring expression identification, subtask2, resolution of bridging references, and subtask3, resolution of discourse deixis/abstract anaphora. In this paper, we propose the pipelined model (i.e., a resolution of anaphoric identity and a resolution of bridging references) for the subtask1 and the subtask2. In the subtask1, our model detects mention via the parentheses prediction. Then, we yield mention representation using the token representation constituting the mention. Mention representation is fed to the coreference resolution model for clustering. In the subtask2, our model resolves bridging references via the MRC framework. We construct query for each entity as “What is related of ENTITY?”. The input of our model is query and documents(i.e., all utterances of dialogue). Then, our model predicts entity span that is answer for query.

pdf bib
An End-to-End Approach for Full Bridging Resolution
Joseph Renner | Priyansh Trivedi | Gaurav Maheshwari | Rémi Gilleron | Pascal Denis

In this article, we describe our submission to the CODI-CRAC 2021 Shared Task on Anaphora Resolution in Dialogues – Track BR (Gold). We demonstrate the performance of an end-to-end transformer-based higher-order coreference model finetuned for the task of full bridging. We find that while our approach is not effective at modeling the complexities of the task, it performs well on bridging resolution, suggesting a need for investigations into a robust anaphor identification model for future improvements.

pdf bib
Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues
Liyan Xu | Jinho D. Choi

We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adaptation strategies, they are shown to bring significant impact to the final performance, with up to 27 F1 improvement over the baseline. Our final system ranks the 1st place on the leaderboard of the anaphora resolution track in the CRAC 2021 shared task, and achieves the best evaluation results on all four datasets.

pdf bib
Anaphora Resolution in Dialogue: Cross-Team Analysis of the DFKI-TalkingRobots Team Submissions for the CODI-CRAC 2021 Shared-Task
Natalia Skachkova | Cennet Oguz | Tatiana Anikina | Siyu Tao | Sharmila Upadhyaya | Ivana Kruijff-Korbayova

We compare our team’s systems to others submitted for the CODI-CRAC 2021 Shared-Task on anaphora resolution in dialogue. We analyse the architectures and performance, report some problematic cases in gold annotations, and suggest possible improvements of the systems, their evaluation, data annotation, and the organization of the shared task.

pdf bib
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis
Shengjie Li | Hideo Kobayashi | Vincent Ng

The CODI-CRAC 2021 shared task is the first shared task that focuses exclusively on anaphora resolution in dialogue and provides three tracks, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. We perform a cross-task analysis of the systems that participated in the shared task in each of these tracks.