A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models

Fabian Galetzka, Chukwuemeka Uchenna Eneh, David Schlangen


Abstract
Fully data driven Chatbots for non-goal oriented dialogues are known to suffer from inconsistent behaviour across their turns, stemming from a general difficulty in controlling parameters like their assumed background personality and knowledge of facts. One reason for this is the relative lack of labeled data from which personality consistency and fact usage could be learned together with dialogue behaviour. To address this, we introduce a new labeled dialogue dataset in the domain of movie discussions, where every dialogue is based on pre-specified facts and opinions. We thoroughly validate the collected dialogue for adherence of the participants to their given fact and opinion profile, and find that the general quality in this respect is high. This process also gives us an additional layer of annotation that is potentially useful for training models. We introduce as a baseline an end-to-end trained self-attention decoder model trained on this data and show that it is able to generate opinionated responses that are judged to be natural and knowledgeable and show attentiveness.
Anthology ID:
2020.lrec-1.71
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
565–573
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.71
DOI:
Bibkey:
Cite (ACL):
Fabian Galetzka, Chukwuemeka Uchenna Eneh, and David Schlangen. 2020. A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 565–573, Marseille, France. European Language Resources Association.
Cite (Informal):
A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models (Galetzka et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.71.pdf
Code
 fabiangal/komodis-dataset