Youngbum Kim


2024

pdf bib
Tree-of-Question: Structured Retrieval Framework for Korean Question Answering Systems
Dongyub Lee | Younghun Jeong | Hwa-Yeon Kim | Hongyeon Yu | Seunghyun Han | Taesun Whang | Seungwoo Cho | Chanhee Lee | Gunsu Lee | Youngbum Kim
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

We introduce Korean language-specific RAG-based QA systems, primarily through the innovative Tree-of-Question (ToQ) methodology and enhanced query generation techniques. We address the complex, multi-hop nature of real-world questions by effectively integrating advanced LLMs with nuanced query planning. Our comprehensive evaluations, including a newly created Korean multi-hop QA dataset, demonstrate our method’s ability to elevate response validity and accuracy, especially in deeper levels of reasoning. This paper not only showcases significant progress in handling the intricacies of Korean linguistic structures but also sets a new standard in the development of context-aware and linguistically sophisticated QA systems.

2018

pdf bib
Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging
Andrew Matteson | Chanhee Lee | Youngbum Kim | Heuiseok Lim
Proceedings of the 27th International Conference on Computational Linguistics

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.