Jihyuk Kim


2024

pdf bib
QuBE: Question-based Belief Enhancement for Agentic LLM Reasoning
Minsoo Kim | Jongyoon Kim | Jihyuk Kim | Seung-won Hwang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Despite advancements in Large Language Models (LLMs), many complex tasks are not easily solved in a single inference step, requiring the use of agentic LLMs in interactive environments. However, agentic LLMs suffer from a phenomenon known as reasoning derailment, due to the indiscriminate incorporation of observations from partially observable environments. We introduce QuBE, a method that enhances agents’ focus on task-relevant contexts, by constructing a belief state via question answering. We validate QuBE through experiments in two agentic LLM scenarios with partial observability: 1) a canonical interactive decision-making scenario using text-based game engines, and 2) an interactive retrieval-augmented generation (RAG) scenario using search engines. In the AlfWorld text-based game, QuBE outperforms established baselines by substantial margins, and in the search engine scenario, it achieves marked improvements on the BeIR zero-shot retrieval benchmark. The results demonstrate that QuBE significantly mitigates reasoning derailment, refining the decision-making process of LLM agents in partially observed environments.

2023

pdf bib
Retrieval-augmented Video Encoding for Instructional Captioning
Yeonjoon Jung | Minsoo Kim | Seungtaek Choi | Jihyuk Kim | Minji Seo | Seung-won Hwang
Findings of the Association for Computational Linguistics: ACL 2023

Instructional videos make learning knowledge more efficient, by providing a detailed multimodal context of each procedure in instruction.A unique challenge posed by instructional videos is key-object degeneracy, where any single modality fails to sufficiently capture the key objects referred to in the procedure. For machine systems, such degeneracy can disturb the performance of a downstream task such as dense video captioning, leading to the generation of incorrect captions omitting key objects. To repair degeneracy, we propose a retrieval-based framework to augment the model representations in the presence of such key-object degeneracy. We validate the effectiveness and generalizability of our proposed framework over baselines using modalities with key-object degeneracy.

pdf bib
Relevance-assisted Generation for Robust Zero-shot Retrieval
Jihyuk Kim | Minsoo Kim | Joonsuk Park | Seung-won Hwang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Zero-shot retrieval tasks such as the BEIR benchmark reveal out-of-domain generalization as a key weakness of high-performance dense retrievers. As a solution, domain adaptation for dense retrievers has been actively studied. A notable approach is synthesizing domain-specific data, by generating pseudo queries (PQ), for fine-tuning with domain-specific relevance between PQ and documents. Our contribution is showing that key biases can cause sampled PQ to be irrelevant, negatively contributing to generalization. We propose to preempt their generation, by dividing the generation into simpler subtasks, of generating relevance explanations and guiding the generation to avoid negative generalization. Experiment results show that our proposed approach is more robust to domain shifts, validated on challenging BEIR zero-shot retrieval tasks.

2022

pdf bib
Collective Relevance Labeling for Passage Retrieval
Jihyuk Kim | Minsoo Kim | Seung-won Hwang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Deep learning for Information Retrieval (IR) requires a large amount of high-quality query-document relevance labels, but such labels are inherently sparse. Label smoothing redistributes some observed probability mass over unobserved instances, often uniformly, uninformed of the true distribution. In contrast, we propose knowledge distillation for informed labeling, without incurring high computation overheads at evaluation time. Our contribution is designing a simple but efficient teacher model which utilizes collective knowledge, to outperform state-of-the-arts distilled from a more complex teacher model. Specifically, we train up to ×8 faster than the state-of-the-art teacher, while distilling the rankings better. Our code is publicly available at https://github.com/jihyukkim-nlp/CollectiveKD.

pdf bib
Pseudo-Relevance for Enhancing Document Representation
Jihyuk Kim | Seung-won Hwang | Seoho Song | Hyeseon Ko | Young-In Song
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

This paper studies how to enhance the document representation for the bi-encoder approach in dense document retrieval. The bi-encoder, separately encoding a query and a document as a single vector, is favored for high efficiency in large-scale information retrieval, compared to more effective but complex architectures. To combine the strength of the two, the multi-vector representation of documents for bi-encoder, such as ColBERT preserving all token embeddings, has been widely adopted. Our contribution is to reduce the size of the multi-vector representation, without compromising the effectiveness, supervised by query logs. Our proposed solution decreases the latency and the memory footprint, up to 8- and 3-fold, validated on MSMARCO and real-world search query logs.

2021

pdf bib
Structure-Augmented Keyphrase Generation
Jihyuk Kim | Myeongho Jeong | Seungtaek Choi | Seung-won Hwang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper studies the keyphrase generation (KG) task for scenarios where structure plays an important role. For example, a scientific publication consists of a short title and a long body, where the title can be used for de-emphasizing unimportant details in the body. Similarly, for short social media posts (, tweets), scarce context can be augmented from titles, though often missing. Our contribution is generating/augmenting structure then injecting these information in the encoding, using existing keyphrases of other documents, complementing missing/incomplete titles. We propose novel structure-augmented document encoding approaches that consist of the following two phases: The first phase, generating structure, extends the given document with related but absent keyphrases, augmenting missing context. The second phase, encoding structure, builds a graph of keyphrases and the given document to obtain the structure-aware representation of the augmented text. Our empirical results validate that our proposed structure augmentation and augmentation-aware encoding/decoding can improve KG for both scenarios, outperforming the state-of-the-art.