Kaiping Peng
2023
XDailyDialog: A Multilingual Parallel Dialogue Corpus
Zeming Liu
|
Ping Nie
|
Jie Cai
|
Haifeng Wang
|
Zheng-Yu Niu
|
Peng Zhang
|
Mrinmaya Sachan
|
Kaiping Peng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
High-quality datasets are significant to the development of dialogue models. However, most existing datasets for open-domain dialogue modeling are limited to a single language. The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual transfer learning, but also hinders the development of robust open-domain dialog systems that can be deployed in other parts of the world. In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialog. XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total). We then propose a dialog generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue. Experiment results show the effectiveness of this framework. We will make XDailyDialog and kNN-Chat publicly available soon.
Search
Co-authors
- Zeming Liu 1
- Ping Nie 1
- Jie Cai 1
- Haifeng Wang 1
- Zheng-Yu Niu 1
- show all...
Venues
- acl1