Dongsuk Oh
2025
Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data
Changwon Ok
|
Eunkyeong Lee
|
Dongsuk Oh
Proceedings of the 31st International Conference on Computational Linguistics
Recently, large language models (LLMs) have made significant progress through retrieval-augmented generation (RAG) and preference learning. However, they still exhibit issues such as confirmation bias, the tendency to favor information that confirms one’s beliefs, which remains largely unexplored in current research. In this paper, we propose a novel approach to mitigate confirmation bias-induced hallucination in LLMs through a synthetic data construction pipeline and Direct Preference Optimization (DPO) training. Our method enhances the integration of diverse and complementary information from multiple passages retrieved by RAG, enabling more balanced and accurate reasoning. Experimental results demonstrate significant improvements in response accuracy and reduced hallucination on benchmarks such as Natural Questions Open and HaluBench. These findings suggest that our approach effectively mitigates confirmation bias in long-context question answering, with potential applications to other NLP tasks. We release our data, and evaluation/train code for public access.3]https://github.com/OccasionallyNLP/Synthetic-Paths-to-Integral-Truth.git
2022
Don’t Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling
Dongsuk Oh
|
Yejin Kim
|
Hodong Lee
|
H. Howie Huang
|
Heuiseok Lim
Proceedings of the 29th International Conference on Computational Linguistics
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERTbase and variants.
2020
I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Jungwoo Lim
|
Dongsuk Oh
|
Yoonna Jang
|
Kisu Yang
|
Heuiseok Lim
Proceedings of the 28th International Conference on Computational Linguistics
CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.
Search
Fix data
Co-authors
- Heui-Seok Lim 2
- H. Howie Huang 1
- Yoonna Jang 1
- Yejin Kim 1
- Hodong Lee 1
- show all...