Exploring Prompt-based Multi-task Learning for Multimodal Dialog State Tracking and Immersive Multimodal Conversation

Yirong Chen, Ya Li, Tao Wang, Xiaofen Xing, Xiangmin Xu, Quan Liu, Cong Liu, Guoping Hu


Abstract
With the rise of the metaverse, immersive multimodal conversation has attracted more and more researchers’ attention. Multimodal contexts will become more important for human-computer interaction in the metaverse, especially in shopping domain. Unlike traditional conversation tasks, immersive multimodal conversation has challenges such as multimodal ambiguous candidate identification and multimodal coreference resolution, which makes it more difficult to dialog state tracking and response generation, as described in SIMMC 2.1 challenge, a part of DSTC11. In particular, as the number of objects in the scene increases, the difficulty will increase dramatically. We proposed a prompt-based multi-task learning Encoder-Decoder, in which different subtasks use different prompts to make the model tend to focus on the current subtask. We achieve the winner in ambiguous candidates indentification and runner-up in multimodal coreference resolution (MM-Coref), multimodal dialog state tracking (MM-DST) and assistant response generation. Our code and model are made publicly available at https://github.com/scutcyr/dstc11-simmc2.1-scut-bds-lab.
Anthology ID:
2023.dstc-1.1
Volume:
Proceedings of The Eleventh Dialog System Technology Challenge
Month:
September
Year:
2023
Address:
Prague, Czech Republic
Editors:
Yun-Nung Chen, Paul Crook, Michel Galley, Sarik Ghazarian, Chulaka Gunasekara, Raghav Gupta, Behnam Hedayatnia, Satwik Kottur, Seungwhan Moon, Chen Zhang
Venues:
DSTC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2023.dstc-1.1
DOI:
Bibkey:
Cite (ACL):
Yirong Chen, Ya Li, Tao Wang, Xiaofen Xing, Xiangmin Xu, Quan Liu, Cong Liu, and Guoping Hu. 2023. Exploring Prompt-based Multi-task Learning for Multimodal Dialog State Tracking and Immersive Multimodal Conversation. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 1–8, Prague, Czech Republic. Association for Computational Linguistics.
Cite (Informal):
Exploring Prompt-based Multi-task Learning for Multimodal Dialog State Tracking and Immersive Multimodal Conversation (Chen et al., DSTC-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dstc-1.1.pdf