Donghee Han

2025

pdf bib abs
Leveraging LLM-Generated Schema Descriptions for Unanswerable Question Detection in Clinical Data
Donghee Han | Seungjae Lim | Daeyoung Roh | Sangryul Kim | Sehyun Kim | Mun Yong Yi
Proceedings of the 31st International Conference on Computational Linguistics

Recent advancements in large language models (LLMs) have boosted research on generating SQL queries from domain-specific questions, particularly in the medical domain. A key challenge is detecting and filtering unanswerable questions. Existing methods often relying on model uncertainty, but these require extra resources and lack interpretability. We propose a lightweight model that predicts relevant database schemas to detect unanswerable questions, enhancing interpretability and addressing the data imbalance in binary classification tasks. Furthermore, we found that LLM-generated schema descriptions can significantly enhance the prediction accuracy. Our method provides a resource-efficient solution for unanswerable question detection in domain-specific question answering systems.

pdf bib abs
Rethinking LLM-Based Recommendations: A Personalized Query-Driven Parallel Integration
Donghee Han | Hwanjun Song | Mun Yong Yi
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent studies have explored integrating large langucage models (LLMs) into recommendation systems but face several challenges, including training-induced bias and bottlenecks from serialized architecture.To effectively address these issues, we propose a Query-to-Recommendation, a parallel recommendation framework that decouples LLMs from candidate pre-selection and instead enables direct retrieval over the entire item pool. Our framework connects LLMs and recommendation models in a parallel manner, allowing each component to independently utilize its strengths without interfering with the other. In this framework, LLMs are utilized to generate feature-enriched item descriptions and personalized user queries, allowing for capturing diverse preferences and enabling rich semantic matching in a zero-shot manner. To effectively combine the complementary strengths of LLM and collaborative signals, we introduce an adaptive reranking strategy. Extensive experiments demonstrate an improvement in performance up to 57%, while also improving the novelty and diversity of recommendations.

2024

pdf bib abs
ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling
Sangryul Kim | Donghee Han | Sehyun Kim
Proceedings of the 6th Clinical Natural Language Processing Workshop

Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database.We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.

Co-authors

Hwanjun Song 1

Venues

Fix author