PRODIGy: a PROfile-based DIalogue Generation dataset

Daniela Occhipinti, Serra Sinem Tekiroğlu, Marco Guerini


Abstract
Providing dialogue agents with a profile representation can improve their consistency and coherence, leading to better conversations. However, current profile-based dialogue datasets for training such agents contain either explicit profile representations that are simple and dialogue-specific, or implicit representations that are difficult to collect. In this work, we introduce the PRODIGy (PROfile-based DIalogue Generation) dataset, which brings diverse representations together, providing a more comprehensive profile dimension set for each speaker. This resource comprises more than 20k dialogues, sourced from movie scripts, aligned with speaker representations such as communication style, biography, personality and gender. Initial experiments with diverse baselines show that providing generative language models with these aspects of a profile, both separately and jointly, enhances models’ performance. This improvement holds true in both in-domain and cross-domain settings, for both fine-tuned and instruction-based LLMs.
Anthology ID:
2024.findings-naacl.222
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3500–3514
Language:
URL:
https://aclanthology.org/2024.findings-naacl.222
DOI:
10.18653/v1/2024.findings-naacl.222
Bibkey:
Cite (ACL):
Daniela Occhipinti, Serra Sinem Tekiroğlu, and Marco Guerini. 2024. PRODIGy: a PROfile-based DIalogue Generation dataset. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3500–3514, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
PRODIGy: a PROfile-based DIalogue Generation dataset (Occhipinti et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.222.pdf