Kyoung-Woon On


2024

pdf bib
How Well Do Large Language Models Truly Ground?
Hyunji Lee | Se June Joo | Chaeeun Kim | Joel Jang | Doyoung Kim | Kyoung-Woon On | Minjoon Seo
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines “grounding” as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.

2023

pdf bib
Efficient Latent Variable Modeling for Knowledge-Grounded Dialogue Generation
Gunsoo Han | Daejin Jo | Daniel Nam | Eunseop Yoon | Taehwan Kwon | Seungeun Rho | Kyoung-Woon On | Chang Yoo | Sungwoong Kim
Findings of the Association for Computational Linguistics: EMNLP 2023

Knowledge-grounded dialogue generation requires first retrieving appropriate external knowledge based on a conversational context and then generating a response grounded on the retrieved knowledge. In general, these two sequential modules, a knowledge retriever and a response generator, have been separately trained in a supervised manner. However, obtaining intermediate labels of the ground-truth knowledge is expensive, especially in open-domain conversations. Latent variable modeling avoids this need for the labels. In this paper, we propose an efficient algorithm for this latent variable modeling that is able to leverage a large amount of dialogue data. Rather than directly training the complex retriever, we adapt a query generator with an off-the-shelf retriever, and the query generator and response generator are simultaneously trained over the latent variable of query. Moreover, we employ lower bound of the evidence as a training objective and modify it to robustly perform the joint training. Experimental results on diverse knowledge-grounded dialogue datasets show that the proposed algorithm significantly outperforms the supervised learning algorithm even without the use of the annotated knowledge while maintaining efficiency and scalability.

2020

pdf bib
Toward General Scene Graph: Integration of Visual Semantic Knowledge with Entity Synset Alignment
Woo Suk Choi | Kyoung-Woon On | Yu-Jung Heo | Byoung-Tak Zhang
Proceedings of the First Workshop on Advances in Language and Vision Research

Scene graph is a graph representation that explicitly represents high-level semantic knowledge of an image such as objects, attributes of objects and relationships between objects. Various tasks have been proposed for the scene graph, but the problem is that they have a limited vocabulary and biased information due to their own hypothesis. Therefore, results of each task are not generalizable and difficult to be applied to other down-stream tasks. In this paper, we propose Entity Synset Alignment(ESA), which is a method to create a general scene graph by aligning various semantic knowledge efficiently to solve this bias problem. The ESA uses a large-scale lexical database, WordNet and Intersection of Union (IoU) to align the object labels in multiple scene graphs/semantic knowledge. In experiment, the integrated scene graph is applied to the image-caption retrieval task as a down-stream task. We confirm that integrating multiple scene graphs helps to get better representations of images.