Yoonna Jang


2023

pdf bib
Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations
Yoonna Jang | Suhyune Son | Jeongwoo Lee | Junyoung Son | Yuna Hur | Jungwoo Lim | Hyeonseok Moon | Kisu Yang | Heuiseok Lim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, entity-level hallucination that causes critical misinformation and undesirable conversation is one of the major concerns. To address this issue, we propose a post-hoc refinement method called REM. It aims to enhance the quality and faithfulness of hallucinated utterances by refining them based on the source knowledge. If the generated utterance has a low source-faithfulness score with the given knowledge, REM mines the key entities in the knowledge and implicitly uses them for refining the utterances. We verify that our method reduces entity hallucination in the utterance. Also, we show the adaptability and efficacy of REM with extensive experiments and generative results. Our code is available at https://github.com/YOONNAJANG/REM.

pdf bib
PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
Seungjun Lee | Yoonna Jang | Chanjun Park | Jungseob Lee | Jaehyung Seo | Hyeonseok Moon | Sugyeong Eo | Seounghoon Lee | Bernardo Yahya | Heuiseok Lim
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

English is acknowledged worldwide as a mode of communication. However, due to the absence of realistic practicing scenarios, students learning English as a foreign language (EFL) typically have limited chances to converse and share feedback with others. In this paper, we propose PEEP-Talk, a real-world situational dialogue-based chatbot designed for English education. It also naturally switches to a new topic or situation in response to out-of-topic utterances, which are common among English beginners. Furthermore, PEEP-Talk provides feedback score on conversation and grammar error correction. We performed automatic and user evaluations to validate performance and education efficiency of our system. The results show that PEEP-Talk generates appropriate responses in various real-life situations while providing accurate feedback to learners. Moreover, we demonstrate a positive impact on English-speaking, grammar, and English learning anxiety, implying that PEEP-Talk can lower the barrier to learning natural conversation in effective ways.

pdf bib
Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue
Jeiyoon Park | Yoonna Jang | Chanhee Lee | Heuiseok Lim
Proceedings of The Eleventh Dialog System Technology Challenge

The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that the combined selection of utterance embedding and clustering method in the intent induction task should be carefully considered. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source codes are available at https://github.com/Jeiyoon/dstc11-track2.

pdf bib
Explore the Way: Exploring Reasoning Path by Bridging Entities for Effective Cross-Document Relation Extraction
Junyoung Son | Jinsung Kim | Jungwoo Lim | Yoonna Jang | Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2023

Cross-document relation extraction (CodRED) task aims to infer the relation between two entities mentioned in different documents within a reasoning path. Previous studies have concentrated on merely capturing implicit relations between the entities. However, humans usually utilize explicit information chains such as hyperlinks or additional searches to find the relations between two entities. Inspired by this, we propose Path wIth expLOraTion (PILOT) that provides the enhanced reasoning path by exploring the explicit clue information within the documents. PILOT finds the bridging entities which directly guide the paths between the entities and then employs them as stepstones to navigate desirable paths. We show that models with PILOT outperform the baselines in the CodRED task. Furthermore, we offer a variety of analyses to verify the validity of the reasoning paths constructed through PILOT, including evaluations using large language models such as ChatGPT.

2022

pdf bib
FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
Chanjun Park | Yoonna Jang | Seolhwa Lee | Sungjin Park | Heuiseok Lim
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models. A persona-based dialogue system that is embedded in NAO provides an interesting and consistent multi-turn dialogue for users. Also, an grammar error correction system promotes improvement in grammar skills of the users. Thus, our system enables personalized learning based on persona dialogue and facilitates grammar learning of a user using grammar error feedback. Furthermore, we verified whether FreeTalky provides practical help in alleviating xenoglossophobia by replacing the real human in the conversation with a NAO robot, through human evaluation.

pdf bib
PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities
Chanjun Park | Yoonna Jang | Seolhwa Lee | Jaehyung Seo | Kisu Yang | Heuiseok Lim
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations

Children with language disabilities face communication difficulties in daily life. They are often deprived of the opportunity to participate in social activities due to their difficulty in understanding or using natural language. In this regard, Augmentative and Alternative Communication (AAC) can be a practical means of communication for children with language disabilities. In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities. PicTalky can process both text and pictograms more accurately by connecting a series of neural-based NLP modules. Additionally, we perform quantitative and qualitative analyses on the modules of PicTalky. By using this service, it is expected that those suffering from language problems will be able to express their intentions or desires more easily and improve their quality of life. We have made the models freely available alongside a demonstration of the web interface. Furthermore, we implemented robotics AAC for the first time by applying PicTalky to the NAO robot.

pdf bib
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
Jaehyung Seo | Seounghoon Lee | Chanjun Park | Yoonna Jang | Hyeonseok Moon | Sugyeong Eo | Seonmin Koo | Heuiseok Lim
Findings of the Association for Computational Linguistics: NAACL 2022

Recent natural language understanding (NLU) research on the Korean language has been vigorously maturing with the advancements of pretrained language models and datasets. However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i.e., generative commonsense reasoning). The two major challenges are inadequate data resources to develop generative commonsense reasoning regarding Korean linguistic features and to evaluate language models which are necessary for natural language generation (NLG). To solve these problems, we propose a text-generation dataset for Korean generative commonsense reasoning and language model evaluation. In this work, a semi-automatic dataset construction approach filters out contents inexplicable to commonsense, ascertains quality, and reduces the cost of building the dataset. We also present an in-depth analysis of the generation results of language models with various evaluation metrics along with human-annotated scores. The whole dataset is publicly available at (https://aihub.or.kr/opendata/korea-university).

pdf bib
You Truly Understand What I Need : Intellectual and Friendly Dialog Agents grounding Persona and Knowledge
Jungwoo Lim | Myunghoon Kang | Yuna Hur | Seung Won Jeong | Jinsung Kim | Yoonna Jang | Dongyub Lee | Hyesung Ji | DongHoon Shin | Seungryong Kim | Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2022

To build a conversational agent that interacts fluently with humans, previous studies blend knowledge or personal profile into the pre-trained language model. However, the model that considers knowledge and persona at the same time is still limited, leading to hallucination and a passive way of using personas. We propose an effective dialogue agent that grounds external knowledge and persona simultaneously. The agent selects the proper knowledge and persona to use for generating the answers with our candidate scoring implemented with a poly-encoder. Then, our model generates the utterance with lesser hallucination and more engagingness utilizing retrieval augmented generation with knowledge-persona enhanced query. We conduct experiments on the persona-knowledge chat and achieve state-of-the-art performance in grounding and generation tasks on the automatic metrics. Moreover, we validate the answers from the models regarding hallucination and engagingness through human evaluation and qualitative results. We show our retriever’s effectiveness in extracting relevant documents compared to the other previous retrievers, along with the comparison of multiple candidate scoring methods. Code is available at https://github.com/dlawjddn803/INFO

pdf bib
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
Heuiseok Lim | Seungryong Kim | Yeonsoo Lee | Steve Lin | Paul Hongsuck Seo | Yumin Suh | Yoonna Jang | Jungwoo Lim | Yuna Hur | Suhyune Son
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge

2020

pdf bib
I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Jungwoo Lim | Dongsuk Oh | Yoonna Jang | Kisu Yang | Heuiseok Lim
Proceedings of the 28th International Conference on Computational Linguistics

CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.