Hyewon Choi


2023

pdf bib
Multi-Task Learning for Ambiguous Candidate Identification with Pre-trained Model
Daesik Jang | Hyewon Choi
Proceedings of The Eleventh Dialog System Technology Challenge

Recently, research using multimodal datasets containing image and text information has been conducted actively. One of them is the SIMMC2.1 dataset. It is a more complicated dataset than answering a conversation using only text because it should predict an answer after understanding the relationship between images and text. Therefore, there are limitations to answering a conversation only using text-based models such as BERT or GPT-2, so models with both image and language understanding abilities should be considered. We propose a new model that is effective for the ambiguous candidate identification task in DSTC11 SIMMC2.1 Tark. It consists of a simple pipeline model structure, which has two steps. The first step is to check whether there is ambiguity in the current user utterance, and the second step is to extract objects mentioned in the ambiguous utterance of the user. We suggest a new learning framework with a pre-trained image model and text model that is effective for the ambiguous candidate identification task. Experiments show that the proposed method can improve the model performance, and our model achieved 3rd place in sub-task 1 of the SIMMC2.1 track.

2022

pdf bib
How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image
Hyewon Choi | Yejun Yoon | Seunghyun Yoon | Kunwoo Park
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations

This study investigates how fake news use the thumbnail image for a news article. We aim at capturing the degree of semantic incongruity between news text and image by using the pretrained CLIP representation. Motivated by the stylistic distinctiveness in fake news text, we examine whether fake news tends to use an irrelevant image to the news content. Results show that fake news tends to have a high degree of semantic incongruity than general news. We further attempt to detect such image-text incongruity by training classification models on a newly generated dataset. A manual evaluation suggests our method can find news articles of which the thumbnail image is semantically irrelevant to news text with an accuracy of 0.8. We also release a new dataset of image and news text pairs with the incongruity label, facilitating future studies on the direction.