Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets

Ehsan Lotfi; Maxime De Bruyn; Jeska Buhmann; Walter Daelemans

doi:10.18653/v1/2023.dialdoc-1.12

Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets

Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

Abstract

Crowd-sourcing has been one of the primary ways to curate conversational data, specially for certain scenarios like grounding in knowledge. In this setting, using online platforms like AMT, non-expert participants are hired to converse with each other, following instructions which try to guide the outcome towards the desired format. The resulting data then is used for different parts of dialog modelling like knowledge selection and response selection/generation. In this work, we take a closer look into two of the most popular knowledge grounded dialog (KGD) datasets. Investigating potential biases and artefacts in knowledge selection labels, we observe that in many cases the ‘knowledge selection flow’ simply follows the order of presented knowledge pieces. In Wizard of Wikipedia (the most popular KGD dataset) we use simple content-agnostic models based on this bias to get significant knowledge selection performance. In Topical-Chat we see a similar correlation between the knowledge selection sequence and the order of entities and their segments, as provided to crowd-source workers. We believe that the observed results, question the significance and origin of the presumed dialog-level attributes like ‘knowledge flow’ in these crowd-sourced datasets.

Anthology ID:: 2023.dialdoc-1.12
Volume:: Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Smaranda Muresan, Vivian Chen, Kennington Casey, Vandyke David, Dethlefs Nina, Inoue Koji, Ekstedt Erik, Ultes Stefan
Venue:: dialdoc
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 109–121
Language:
URL:: https://aclanthology.org/2023.dialdoc-1.12/
DOI:: 10.18653/v1/2023.dialdoc-1.12
Bibkey:
Cite (ACL):: Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, and Walter Daelemans. 2023. Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets. In Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 109–121, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Follow the Knowledge: Structural Biases and Artefacts in Knowledge Grounded Dialog Datasets (Lotfi et al., dialdoc 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.dialdoc-1.12.pdf

PDF Cite Search Fix data