Joo Won Sung

Also published as: Joo-won Sung


2024

pdf bib
Large Language Models can Share Images, Too!
Young-Jun Lee | Dokyong Lee | Joo Won Sung | Jonghwan Hyeon | Ho-Jin Choi
Findings of the Association for Computational Linguistics ACL 2024

This paper explores the image-sharing capability of Large Language Models (LLMs), such as GPT-4 and LLaMA 2, in a zero-shot setting. To facilitate a comprehensive evaluation of LLMs, we introduce the photochatplus dataset, which includes enriched annotations (ie intent, triggering sentence, image description, and salient information). Furthermore, we present the gradient-free and extensible Decide, Describe, and Retrieve () framework. With extensive experiments, we unlock the image-sharing capability of equipped with LLMs in zero-shot prompting, with ChatGPT achieving the best performance.Our findings also reveal the emergent image-sharing ability in LLMs under zero-shot conditions, validating the effectiveness of . We use this framework to demonstrate its practicality and effectiveness in two real-world scenarios: (1) human-bot interaction and (2) dataset augmentation. To the best of our knowledge, this is the first study to assess the image-sharing ability of various LLMs in a zero-shot setting. We make our source code and dataset publicly available at https://github.com/passing2961/DribeR.

2023

pdf bib
CopyT5: Copy Mechanism and Post-Trained T5 for Speech-Aware Dialogue State Tracking System
Cheonyoung Park | Eunji Ha | Yewon Jeong | Chi-young Kim | Haeun Yu | Joo-won Sung
Proceedings of The Eleventh Dialog System Technology Challenge

In a real-world environment, Dialogue State Tracking (DST) should use speech recognition results to perform tasks. However, most existing DST research has been conducted in text-based environments. This study aims to build a model that efficiently performs Automatic Speech Recognition-based DST. To operate robustly against speech noise, we used CopyT5, which adopted a copy mechanism, and trained the model using augmented data including speech noise. Furthermore, CopyT5 performed post-training using the masked language modeling method with the MultiWOZ dataset in T5 in order to learn the dialogue context better. The copy mechanism also mitigated name entity errors that may occur during DST generation. Experiments confirmed that data augmentation, post-training, and the copy mechanism effectively improve DST performance.