Hongjin Kim


2023

pdf bib
A Framework for Vision-Language Warm-up Tasks in Multimodal Dialogue Models
Jaewook Lee | Seongsik Park | Seong-Heum Park | Hongjin Kim | Harksoo Kim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Most research on multimodal open-domain dialogue agents has focused on pretraining and multi-task learning using additional rich datasets beyond a given target dataset. However, methods for exploiting these additional datasets can be quite limited in real-world settings, creating a need for more efficient methods for constructing agents based solely on the target dataset. To address these issues, we present a new learning strategy called vision-language warm-up tasks for multimodal dialogue models (VLAW-MDM). This strategy does not require the use of large pretraining or multi-task datasets but rather relies solely on learning from target data. Moreover, our proposed approach automatically generate captions for images and incorporate them into the model’s input to improve the contextualization of visual information. Using this novel approach, we empirically demonstrate that our learning strategy is effective for limited data and relatively small models. The result show that our method achieved comparable and in some cases superior performance compared to existing state-of-the-art models on various evaluation metrics.

2021

pdf bib
The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference
Hongjin Kim | Damrin Kim | Harksoo Kim
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The objective of anaphora resolution in dialogue shared-task is to go above and beyond the simple cases of coreference resolution in written text on which NLP has mostly focused so far, which arguably overestimate the performance of current SOTA models. The anaphora resolution in dialogue shared-task consists of three subtasks; subtask1, resolution of anaphoric identity and non-referring expression identification, subtask2, resolution of bridging references, and subtask3, resolution of discourse deixis/abstract anaphora. In this paper, we propose the pipelined model (i.e., a resolution of anaphoric identity and a resolution of bridging references) for the subtask1 and the subtask2. In the subtask1, our model detects mention via the parentheses prediction. Then, we yield mention representation using the token representation constituting the mention. Mention representation is fed to the coreference resolution model for clustering. In the subtask2, our model resolves bridging references via the MRC framework. We construct query for each entity as “What is related of ENTITY?”. The input of our model is query and documents(i.e., all utterances of dialogue). Then, our model predicts entity span that is answer for query.