Joonwoo Park


2023

pdf bib
Contrastively Pretrained Vision-Language Transformers and Domain Adaptation Methods for Multimodal TOD Systems
Youngjae Chang | Doo Young Kim | Jinyoung Kim | Keunha Kim | Hyunmook Cha | Suyoung Min | Youngjoong Ko | Kye-Hwan Lee | Joonwoo Park
Proceedings of The Eleventh Dialog System Technology Challenge

The Situated Interactive MultiModal Conversations (SIMMC2.1) Challenge 2022 is hosted by the Eleventh Dialog System Technology Challenge (DSTC11). This is the third consecutive year multimodal dialog systems have been selected as an official track of the competition, promoted by the continued interest in the research community. The task of SIMMC is to create a shopping assistant agent that can communicate with customers in a virtual store. It requires processing store scenes and product catalogs along with the customer’s request. The task is decomposed into four steps and each becomes a subtask. In this work, we explore the common approaches to modeling multimodality and find the method with the most potential. We also identify a discrepancy in using pretrained language models for dialog tasks and devise a simple domain-adaptation method. Our model came in third place for object coreferencing, dialog state tracking, and response generation tasks.