Semantic data augmentation for meaning maintenance on Task-Oriented Conversation with Large-size Language Model

Jaehwan Lee, Kwanyoung Son, Eugene Kim


Abstract
This paper presents our approach to building a generalized model for Track 5 in DSTC11: “Task-oriented Conversational Modeling with Subjective Knowledge” which addresses the challenge of generating responses to users’ utterances based on a variety of factual and subjective knowledge. To tackle this challenge, we first augmented the training data by leveraging contextual word embedding and back translation, thereby increasing the quantity of available data. Then, we utilized a large-size language model to enhance the acceptability of the augmented data and fine-tuned the model using augmented data. Specifically, we applied the DeBERTa-v3-large model for knowledge detection and selection, and the BART-large model for response generation. Our best model achieved the seventh rank in the objective evaluation and the second rank in the final official human evaluation. These outcomes serve as solid evidence that data augmentation and using a large-size model were highly effective for developing a conversational model system that incorporates objective and subjective knowledge.
Anthology ID:
2023.dstc-1.19
Volume:
Proceedings of The Eleventh Dialog System Technology Challenge
Month:
September
Year:
2023
Address:
Prague, Czech Republic
Editors:
Yun-Nung Chen, Paul Crook, Michel Galley, Sarik Ghazarian, Chulaka Gunasekara, Raghav Gupta, Behnam Hedayatnia, Satwik Kottur, Seungwhan Moon, Chen Zhang
Venues:
DSTC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–176
Language:
URL:
https://aclanthology.org/2023.dstc-1.19
DOI:
Bibkey:
Cite (ACL):
Jaehwan Lee, Kwanyoung Son, and Eugene Kim. 2023. Semantic data augmentation for meaning maintenance on Task-Oriented Conversation with Large-size Language Model. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 166–176, Prague, Czech Republic. Association for Computational Linguistics.
Cite (Informal):
Semantic data augmentation for meaning maintenance on Task-Oriented Conversation with Large-size Language Model (Lee et al., DSTC-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dstc-1.19.pdf