MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation

Scott Martin; Shivani Poddar; Kartikeya Upasani

MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation

Scott Martin, Shivani Poddar, Kartikeya Upasani

Abstract

This paper proposes a new dataset, MuDoCo, composed of authored dialogs between a fictional user and a system who are given tasks to perform within six task domains. These dialogs are given rich linguistic annotations by expert linguists for several types of reference mentions and named entity mentions, either of which can span multiple words, as well as for coreference links between mentions. The dialogs sometimes cross and blend domains, and the users exhibit complex task switching behavior such as re-initiating a previous task in the dialog by referencing the entities within it. The dataset contains a total of 8,429 dialogs with an average of 5.36 turns per dialog. We are releasing this dataset to encourage research in the field of coreference resolution, referring expression generation and identification within realistic, deep dialogs involving multiple domains. To demonstrate its utility, we also propose two baseline models for the downstream tasks: coreference resolution and referring expression generation.

Anthology ID:: 2020.lrec-1.13
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 104–111
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.13/
DOI:
Bibkey:
Cite (ACL):: Scott Martin, Shivani Poddar, and Kartikeya Upasani. 2020. MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 104–111, Marseille, France. European Language Resources Association.
Cite (Informal):: MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation (Martin et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.13.pdf

PDF Cite Search Fix data