Young-Jun Lee

Also published as: Young-jun Lee

2024

pdf bib abs
Large Language Models can Share Images, Too!
Young-Jun Lee | Dokyong Lee | Joo Won Sung | Jonghwan Hyeon | Ho-Jin Choi
Findings of the Association for Computational Linguistics: ACL 2024

This paper explores the image-sharing capability of Large Language Models (LLMs), such as GPT-4 and LLaMA 2, in a zero-shot setting. To facilitate a comprehensive evaluation of LLMs, we introduce the photochatplus dataset, which includes enriched annotations (ie intent, triggering sentence, image description, and salient information). Furthermore, we present the gradient-free and extensible Decide, Describe, and Retrieve () framework. With extensive experiments, we unlock the image-sharing capability of equipped with LLMs in zero-shot prompting, with ChatGPT achieving the best performance.Our findings also reveal the emergent image-sharing ability in LLMs under zero-shot conditions, validating the effectiveness of . We use this framework to demonstrate its practicality and effectiveness in two real-world scenarios: (1) human-bot interaction and (2) dataset augmentation. To the best of our knowledge, this is the first study to assess the image-sharing ability of various LLMs in a zero-shot setting. We make our source code and dataset publicly available at https://github.com/passing2961/DribeR.

pdf bib abs
Enhancing Arguments Recognition for Financial Mathematical Reasoning over Hybrid Data
Jinsu Lim | Yechan Hwang | Young-Jun Lee | Ho-Jin Choi
Findings of the Association for Computational Linguistics: EMNLP 2024

Mathematical question answering over long-form documents is challenging across domains like finance or Wikipedia due to the abundance of candidate arguments within evidence, which complicates recognizing proper arguments for mathematical reasoning and poses hard to learning. In this paper, we propose an approach for training a generator to improve argument recognition. Our method enhances the probabilities of proper arguments in a reasoning program generation so that the arguments comprising the ground truth have higher weights. The proposed approach consists of an argument aggregator to model the probabilities in each candidate generation and an argument set loss to compute the cross-entropy between that probability and the candidates’ existence in the ground truth in terms of the argument set. In our experiments, we show performance improvements of 3.62% and 3.98% in execution accuracy and program accuracy, respectively, over the existing FinQANet model based on a financial mathematical QA dataset. Also, we observed that the similarity of argument sets between the generated program and the ground truth improved by about 2.9%, indicating a mitigation of the misrecognition problem.

Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce , a large-scale long-term multi-modal dialogue dataset that covers a wide range of social personas in a multi-modality format, time intervals, and images. To construct automatically, we propose a novel multi-modal contextualization framework, , that generates long-term multi-modal dialogue distilled from ChatGPT and our proposed image aligner. Using our , we train a multi-modal conversation model, 7B, which demonstrates impressive visual imagination ability. Furthermore, we demonstrate the effectiveness of our dataset in human evaluation. The code, dataset, and model will be publicly released after publication.

pdf bib abs
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
Young-Jun Lee | Byungsoo Ko | Han-Gyu Kim | Jonghwan Hyeon | Ho-Jin Choi
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

As sharing images in an instant message is a crucial factor, there has been active research on learning an image-text multi-modal dialogue models.However, training a well-generalized multi-modal dialogue model remains challenging due to the low quality and limited diversity of images per dialogue in existing multi-modal dialogue datasets.In this paper, we propose an automated pipeline to construct a multi-modal dialogue dataset, ensuring both dialogue quality and image diversity without requiring minimum human effort. In our pipeline, to guarantee the coherence between images and dialogue, we prompt GPT-4 to infer potential image-sharing moments - specifically, the utterance, speaker, rationale, and image description. Furthermore, we leverage CLIP similarity to maintain consistency between aligned multiple images to the utterance.Through this pipeline, we introduce DialogCC, a high-quality and diverse multi-modal dialogue dataset that surpasses existing datasets in terms of quality and diversity in human evaluation.Our comprehensive experiments highlight that when multi-modal dialogue models are trained using our dataset, their generalization performance on unseen dialogue datasets is significantly enhanced. We make our source code and dataset publicly available (https://dialogcc.github.io/).

2023

pdf bib abs
Semantic Ambiguity Detection in Sentence Classification using Task-Specific Embeddings
Jong Myoung Kim | Young-jun Lee | Sangkeun Jung | Ho-jin Choi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Ambiguity is a major obstacle to providing services based on sentence classification. However, because of the structural limitations of the service, there may not be sufficient contextual information to resolve the ambiguity. In this situation, we focus on ambiguity detection so that service design considering ambiguity is possible. We utilize similarity in a semantic space to detect ambiguity in service scenarios and training data. In addition, we apply task-specific embedding to improve performance. Our results demonstrate that ambiguities and resulting labeling errors in training data or scenarios can be detected. Additionally, we confirm that it can be used to debug services

2022

pdf bib abs
PERSONACHATGEN: Generating Personalized Dialogues using GPT-3
Young-Jun Lee | Chae-Gyun Lim | Yunsu Choi | Ji-Hui Lm | Ho-Jin Choi
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge

Recently, many prior works have made their own agents generate more personalized and engaging responses using personachat. However, since this dataset is frozen in 2018, the dialogue agents trained on this dataset would not know how to interact with a human who loves “Wandavision.” One way to alleviate this problem is to create a large-scale dataset. In this work, we introduce the pipeline of creating personachatgen, which is comprised of three main components: Creating (1) profilegen, (2) Persona Set, and (3) personachatgen. To encourage GPT-3’s generation ability, we also defined a taxonomy of hierarchical persona category derived from social profiling taxonomy. To create the speaker consistent persona set, we propose a simple contradiction-based iterative sentence replacement algorithm, named CoNL. Moreover, to prevent GPT-3 generating harmful content, we presented two filtering pipelines, one each for profilegen and personachatgen. Through analyzing of personachatgen, we showed that GPT-3 can generate personalized dialogue containing diverse persona. Furthermore, we revealed a state-of-the-art Blender 90M trained on our dataset that leads to higher performance.

pdf bib abs
Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation
Young-Jun Lee | Chae-Gyun Lim | Ho-Jin Choi
Proceedings of the 29th International Conference on Computational Linguistics

Since empathy plays a crucial role in increasing social bonding between people, many studies have designed their own dialogue agents to be empathetic using the well-established method of fine-tuning. However, they do not use prompt-based in-context learning, which has shown powerful performance in various natural language processing (NLP) tasks, for empathetic dialogue generation. Although several studies have investigated few-shot in-context learning for empathetic dialogue generation, an in-depth analysis of the generation of empathetic dialogue with in-context learning remains unclear, especially in GPT-3 (Brown et al., 2020). In this study, we explore whether GPT-3 can generate empathetic dialogues through prompt-based in-context learning in both zero-shot and few-shot settings. To enhance performance, we propose two new in-context example selection methods, called SITSM and EMOSITSM, that utilize emotion and situational information. We also introduce a new automatic evaluation method, DIFF-EPITOME, which reflects the human tendency to express empathy. From the analysis, we reveal that our DIFF-EPITOME is effective in measuring the degree of human empathy. We show that GPT-3 achieves competitive performance with Blender 90M, a state-of-the-art dialogue generative model, on both automatic and human evaluation. Our code is available at https://github.com/passing2961/EmpGPT-3.

2020

pdf bib abs
Korean-Specific Emotion Annotation Procedure Using N-Gram-Based Distant Supervision and Korean-Specific-Feature-Based Distant Supervision
Young-Jun Lee | Chae-Gyun Lim | Ho-Jin Choi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Detecting emotions from texts is considerably important in an NLP task, but it has the limitation of the scarcity of manually labeled data. To overcome this limitation, many researchers have annotated unlabeled data with certain frequently used annotation procedures. However, most of these studies are focused mainly on English and do not consider the characteristics of the Korean language. In this paper, we present a Korean-specific annotation procedure, which consists of two parts, namely n-gram-based distant supervision and Korean-specific-feature-based distant supervision. We leverage the distant supervision with the n-gram and Korean emotion lexicons. Then, we consider the Korean-specific emotion features. Through experiments, we showed the effectiveness of our procedure by comparing with the KTEA dataset. Additionally, we constructed a large-scale emotion-labeled dataset, Korean Movie Review Emotion (KMRE) Dataset, using our procedure. In order to construct our dataset, we used a large-scale sentiment movie review corpus as the unlabeled dataset. Moreover, we used a Korean emotion lexicon provided by KTEA. We also performed an emotion classification task and a human evaluation on the KMRE dataset.