Persona-Consistent Dialogue Generation via Pseudo Preference Tuning

Junya Takayama, Masaya Ohagi, Tomoya Mizumoto, Katsumasa Yoshikawa


Abstract
We propose a simple yet effective method for enhancing persona consistency in dialogue response generation using Direct Preference Optimization (DPO). In our method, we generate responses from the response generation model using persona information that has been randomly swapped with data from other dialogues, treating these responses as pseudo-negative samples. The reference responses serve as positive samples, allowing us to create pseudo-preference data. Experimental results demonstrate that our model, fine-tuned with DPO on the pseudo preference data, produces more consistent and natural responses compared to models trained using supervised fine-tuning or reinforcement learning approaches based on entailment relations between personas and utterances.
Anthology ID:
2025.coling-main.369
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5507–5514
Language:
URL:
https://aclanthology.org/2025.coling-main.369/
DOI:
Bibkey:
Cite (ACL):
Junya Takayama, Masaya Ohagi, Tomoya Mizumoto, and Katsumasa Yoshikawa. 2025. Persona-Consistent Dialogue Generation via Pseudo Preference Tuning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5507–5514, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Persona-Consistent Dialogue Generation via Pseudo Preference Tuning (Takayama et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.369.pdf