A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems

San Kim, Jin Yea Jang, Minyoung Jung, Saim Shin


Abstract
Research on open-domain dialogue systems that allow free topics is challenging in the field of natural language processing (NLP). The performance of the dialogue system has been improved recently by the method utilizing dialogue-related knowledge; however, non-English dialogue systems suffer from reproducing the performance of English dialogue systems because securing knowledge in the same language with the dialogue system is relatively difficult. Through experiments with a Korean dialogue system, this paper proves that the performance of a non-English dialogue system can be improved by utilizing English knowledge, highlighting the system uses cross-lingual knowledge. For the experiments, we 1) constructed a Korean version of the Wizard of Wikipedia dataset, 2) built Korean-English T5 (KE-T5), a language model pre-trained with Korean and English corpus, and 3) developed a knowledge-grounded Korean dialogue model based on KE-T5. We observed the performance improvement in the open-domain Korean dialogue model even only English knowledge was given. The experimental results showed that the knowledge inherent in cross-lingual language models can be helpful for generating responses in open dialogue systems.
Anthology ID:
2021.findings-emnlp.33
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
352–365
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.33
DOI:
10.18653/v1/2021.findings-emnlp.33
Bibkey:
Cite (ACL):
San Kim, Jin Yea Jang, Minyoung Jung, and Saim Shin. 2021. A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 352–365, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems (Kim et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.33.pdf
Software:
 2021.findings-emnlp.33.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.33.mp4
Code
 airc-keti/ke-t5
Data
C4Wizard of Wikipedia