Dialogue Language Model with Large-Scale Persona Data Engineering

Mengze Hong; Chen Jason Zhang; Chaotao Chen; Rongzhong Lian; Di Jiang

doi:10.18653/v1/2025.naacl-industry.71

Dialogue Language Model with Large-Scale Persona Data Engineering

Mengze Hong, Chen Jason Zhang, Chaotao Chen, Rongzhong Lian, Di Jiang

Abstract

Maintaining persona consistency is paramount in the application of open-domain dialogue systems, as exemplified by models like ChatGPT. Despite significant advancements, the limited scale and diversity of current persona dialogue datasets remain challenges to achieving robust persona-consistent dialogue models. In this study, drawing inspiration from the success of large-scale pre-training, we introduce PPDS, an open-domain persona dialogue system that employs extensive generative pre-training on a persona dialogue dataset to enhance persona consistency. Specifically, we present a persona extraction model designed to autonomously and precisely generate vast persona dialogue datasets. Additionally, we unveil a pioneering persona augmentation technique to address the invalid persona bias inherent in the constructed dataset. Both quantitative and human evaluations consistently highlight the superior response quality and persona consistency of our proposed model, underscoring its effectiveness.

Anthology ID:: 2025.naacl-industry.71
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 961–970
Language:
URL:: https://aclanthology.org/2025.naacl-industry.71/
DOI:: 10.18653/v1/2025.naacl-industry.71
Bibkey:
Cite (ACL):: Mengze Hong, Chen Jason Zhang, Chaotao Chen, Rongzhong Lian, and Di Jiang. 2025. Dialogue Language Model with Large-Scale Persona Data Engineering. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 961–970, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Dialogue Language Model with Large-Scale Persona Data Engineering (Hong et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-industry.71.pdf

PDF Cite Search Fix data