%0 Conference Proceedings %T Annotation Inconsistency and Entity Bias in MultiWOZ %A Qian, Kun %A Beirami, Ahmad %A Lin, Zhouhan %A De, Ankita %A Geramifard, Alborz %A Yu, Zhou %A Sankar, Chinnadhurai %Y Li, Haizhou %Y Levow, Gina-Anne %Y Yu, Zhou %Y Gupta, Chitralekha %Y Sisman, Berrak %Y Cai, Siqi %Y Vandyke, David %Y Dethlefs, Nina %Y Wu, Yan %Y Li, Junyi Jessy %S Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue %D 2021 %8 July %I Association for Computational Linguistics %C Singapore and Online %F qian-etal-2021-annotation %X MultiWOZ (Budzianowski et al., 2018) is one of the most popular multi-domain taskoriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG) and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in 70% of the dialogs. Additionally, we notice that there is significant entity bias in the dataset (e.g., “cambridge” appears in 50% of the destination cities in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed as the test set suffers from a similar entity bias as well. We release a new test set with all entities replaced with unseen entities. Finally, we benchmark joint goal accuracy (JGA) of the state-of-theart DST baselines on these modified versions of the data. Our experiments show that the annotation inconsistency corrections lead to 7-10% improvement in JGA. On the other hand, we observe a 29% drop in JGA when models are evaluated on the new test set with unseen entities. %R 10.18653/v1/2021.sigdial-1.35 %U https://aclanthology.org/2021.sigdial-1.35 %U https://doi.org/10.18653/v1/2021.sigdial-1.35 %P 326-337