Siwon Kim


2024

pdf bib
GrounDial: Human-norm Grounded Safe Dialog Response Generation
Siwon Kim | Shuyang Dai | Mohammad Kachuee | Shayan Ray | Tara Taghavi | Sungroh Yoon
Findings of the Association for Computational Linguistics: EACL 2024

Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.

2020

pdf bib
Interpretation of NLP models through input marginalization
Siwon Kim | Jihun Yi | Eunji Kim | Sungroh Yoon
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

To demystify the “black box” property of deep neural networks for natural language processing (NLP), several methods have been proposed to interpret their predictions by measuring the change in prediction probability after erasing each token of an input. Since existing methods replace each token with a predefined value (i.e., zero), the resulting sentence lies out of the training data distribution, yielding misleading interpretations. In this study, we raise the out-of-distribution problem induced by the existing interpretation methods and present a remedy; we propose to marginalize each token out. We interpret various NLP models trained for sentiment analysis and natural language inference using the proposed method.