OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue

Wen Cui, Leanne Rolston, Marilyn Walker, Beth Ann Hockey


Abstract
Entity linking in dialogue is the task of mapping entity mentions in utterances to a target knowledge base. Prior work on entity linking has mainly focused on well-written articles such as Wikipedia, annotated newswire, or domain-specific datasets. We extend the study of entity linking to open domain dialogue by presenting the OpenEL corpus: an annotated multi-domain corpus for linking entities in natural conversation to Wikidata. Each dialogic utterance in 179 dialogues over 12 topics from the EDINA dataset has been annotated for entities realized by definite referring expressions as well as anaphoric forms such as he, she, it and they. This dataset supports training and evaluation of entity linking in open-domain dialogue, as well as analysis of the effect of using dialogue context and anaphora resolution in model training. It could also be used for fine-tuning a coreference resolution algorithm. To the best of our knowledge, this is the first substantial entity linking corpus publicly available for open-domain dialogue. We also establish baselines for this task using several existing entity linking systems. We found that the Transformer-based system Flair + BLINK has the best performance with a 0.65 F1 score. Our results show that dialogue context is extremely beneficial for entity linking in conversations, with Flair + Blink achieving an F1 of 0.61 without discourse context. These results also demonstrate the remaining performance gap between the baselines and human performance, highlighting the challenges of entity linking in open-domain dialogue, and suggesting many avenues for future research using OpenEL.
Anthology ID:
2022.lrec-1.241
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2245–2256
Language:
URL:
https://aclanthology.org/2022.lrec-1.241
DOI:
Bibkey:
Cite (ACL):
Wen Cui, Leanne Rolston, Marilyn Walker, and Beth Ann Hockey. 2022. OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2245–2256, Marseille, France. European Language Resources Association.
Cite (Informal):
OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue (Cui et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.241.pdf
Code
 wenzi3241/openel_corpus
Data
Wizard of Wikipedia