Seung-Hoon Na

Also published as: Seung-hoon Na, Seung Hoon Na

2025

KyuHyunChoi at SemEval-2025 Task 10: Narrative Extraction Using a Summarization-Specific Pretrained Model
Kyu Hyun Choi | Seung Hoon Na
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Task 11 of SemEval 2025 was proposed to develop supporting information for analyzing the risks of misinformation and propaganda in news articles. In this study, we selected Sub-task 3—which involves generating evidence explaining why a particular dominant narrative is labeled in an article—and fine-tuned PEGASUS for this purpose, achieving the best performance in the competition.

pdf bib abs

SeqMMR: Sequential Model Merging and LLM Routing for Enhanced Batched Sequential Knowledge Editing
Shanbao Qiao | Xuebing Liu | Akshat Gupta | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2025

Model knowledge editing enables the efficient correction of erroneous information and the continuous updating of outdated knowledge within language models. While existing research has demonstrated strong performance in single-instance or few-instance sequential editing and one-time massive editing scenarios, the batched sequential editing paradigm remains a significant challenge. The primary issue lies in the model’s tendency to gradually forget previously edited knowledge and become increasingly unstable after multiple iterations of batched editing. To address these challenges, we propose **SeqMMR**, an enhanced framework for batched sequential knowledge editing that leverages **Seq**uential **M**odel **M**erging and a model **R**outer. Our approach iteratively merges parameters from current batch-edited models with those of their predecessors, ensuring that newly emerging knowledge is integrated while mitigating the forgetting of previously edited knowledge. Furthermore, the model router directs queries unrelated to the edited knowledge to an unedited model backup, preventing unintended alterations in model predictions. Extensive experiments across various datasets demonstrate that our approach effectively mitigates knowledge forgetting, improves performance across all previous batches, and better preserves the model’s general capabilities.

pdf bib abs

GenPoE: Generative Passage-level Mixture of Experts for Knowledge Enhancement of LLMs
Xuebing Liu | Shanbao Qiao | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2025

Typically, parametric adaptation methods such as domain-adaptive pretraining (DAP) and retrieval-augmented generation (RAG) have been considered effective approaches for adapting large language models (LLMs) to new knowledge or domains. To unify positive effects of parametric adaptation and RAG, this paper proposes GenPoE, i.e., “generative’’ passage-level mixture of experts (MoEs) for enhancing knowledge of LLMs. The key component is its novel MoE-generating hypernetwork which takes in-context retrieved passages and generates their “expert’’ parameters, where these generated parameters are then integrated into LLMs by forming expert networks. With its use of “generated’’ parameters, GenPoE does not require a separate parameter training or finetuning stage, which is often costly. By parameterizing passages into expert networks, GenPoE likely exhibits robustness even when the retrieved passages are irrelevant. Experiment results in two open-domain question answering (QA) tasks present that GenPoE shows improved performances over other passage-level knowledge editing, and its combination of RAG produces superior performances over RAG. Our data and code will be available at https://github.com/Liu-Xuebing/GenPoE.

2024

pdf bib abs

GeminiPro at SemEval-2024 Task 9: BrainTeaser on Gemini
Kyu Hyun Choi | Seung-hoon Na
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

It is known that human thought can be distinguished into lateral and vertical thinking. The development of language models has thus far been focused on evaluating and advancing vertical thinking, while lateral thinking has been somewhat neglected. To foster progress in this area, SemEval has created and distributed a brainteaser dataset based on lateral thinking consist of sentence puzzles and word puzzle QA. In this paper, we test and discuss the performance of the currently known best model, Gemini, on this dataset.

pdf bib abs

SARCAT: Generative Span-Act Guided Response Generation using Copy-enhanced Target Augmentation
Jeong-Doo Lee | Hyeongjun Choi | Beomseok Hong | Youngsub Han | Byoung-Ki Jeon | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2024

In this paper, we present a novel extension to improve the document grounded response generation, by proposing the Generative Span Act Guided Response Generation using Copy enhanced Target Augmentation (SARCAT) that consists of two major components as follows: 1) Copy-enhanced target-side input augmentation is an extended data augmentation to deal with the exposure bias problem by additionally incorporating the copy mechanism on top of the target-side augmentation (Xie et al., 2021). 2) Span-act guided response generation, which first predicts grounding spans and dialogue acts before generating a response. Experiment results on validation set in MultiDoc2Dial show that the proposed SARSAT leads to improvement over strong baselines on both seen and unseen settings and achieves the start-of the-art performance, even with the base reader using the pretrained T5-base model.

pdf bib abs

DistillMIKE: Editing Distillation of Massive In-Context Knowledge Editing in Large Language Models
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2024

Among the recently emerged knowledge editing methods, in-context knowledge editing (IKE) has shown respectable abilities on knowledge editing in terms of generalization and specificity. Noting the promising advantages but unexplored issues of IKE, we propose **DistillMIKE** as a novel extension of IKE, i.e., editing **distill**ation of "**M**assive” **I**n-context **K**nowledge **E**diting in large language models (LLMs), mainly consisting of two expansions; 1) *Massive in-context knowledge editing (MIKE)*, which extends IKE to a massive editing task, aiming to inject not a single edit but a set of massive edits to LLMs; To preserve specificity, our key novel extension is a “selective” retrieval augmentation, where the retrieval-augmented IKE is only applied to “in-scope” examples, whereas the unedited model without IKE is employed for “out-of-scope” ones. 2) *Editing distillation* of MIKE using low-rank adaptation (LoRA), which distills editing abilities of MIKE to parameters of LLMs in a manner of eliminating the need of lengthy in-context demonstrations, thus removing the computational overhead encountered at the inference time. Experimental results on the zsRE and CounterFact datasets demonstrate that MIKE shows the state-of-the-art perfomrances and DistilMIKE show comparable performances with MIKE. Our code is available at https://github.com/JoveReCode/DistillMIKE.git.

pdf bib

COMEM: In-Context Retrieval-Augmented Mass-Editing Memory in Large Language Models
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: NAACL 2024

pdf bib abs

RADCoT: Retrieval-Augmented Distillation to Specialization Models for Generating Chain-of-Thoughts in Query Expansion
Sung-Min Lee | Eunhwan Park | DongHyeon Jeon | Inho Kang | Seung-Hoon Na
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have demonstrated superior performance to that of small language models (SLM) in information retrieval for various subtasks including dense retrieval, reranking, query expansion, and pseudo-document generation. However, the parameter sizes of LLMs are extremely large, making it expensive to operate LLMs stably for providing LLM-based retrieval services. Recently, retrieval-augmented language models have been widely employed to significantly reduce the parameter size by retrieving relevant knowledge from large-scale corpora and exploiting the resulting “in-context” knowledge as additional model input, thereby substantially reducing the burden of internalizing and retaining world knowledge in model parameters. Armed by the retrieval-augmented language models, we present a retrieval-augmented model specialization that distills the capability of LLMs to generate the chain-of-thoughts (CoT) for query expansion – that is, injects the LLM’s capability to generate CoT into a retrieval-augmented SLM – referred to as RADCoT. Experimental results on the MS-MARCO, TREC DL 19, 20 datasets show that RADCoT yields consistent improvements over distillation without retrieval, achieving comparable performance to that of the query expansion method using LLM-based CoTs. Our code is publicly available at https://github.com/ZIZUN/RADCoT.

2023

pdf bib abs

ExplainMeetSum: A Dataset for Explainable Meeting Summarization Aligned with Human Intent
Hyun Kim | Minsoo Cho | Seung-Hoon Na
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To enhance the explainability of meeting summarization, we construct a new dataset called “ExplainMeetSum,” an augmented version of QMSum, by newly annotating evidence sentences that faithfully “explain” a summary. Using ExplainMeetSum, we propose a novel multiple extractor guided summarization, namely Multi-DYLE, which extensively generalizes DYLE to enable using a supervised extractor based on human-aligned extractive oracles. We further present an explainability-aware task, named “Explainable Evidence Extraction” (E3), which aims to automatically detect all evidence sentences that support a given summary. Experimental results on the QMSum dataset show that the proposed Multi-DYLE outperforms DYLE with gains of up to 3.13 in the ROUGE-1 score. We further present the initial results on the E3 task, under the settings using separate and joint evaluation metrics.

pdf bib abs

DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative retrieval, which maps from a query to its relevant document identifiers (docids), has recently emerged as a new information retrieval (IR) paradigm, however, having suffered from 1) the lack of the intermediate reasoning step, caused by the manner of merely using a query to perform the hierarchical classification, and 2) the pretrain-finetune discrepancy, which comes from the use of the artificial symbols of docids. To address these limitations, we propose the novel approach of using the document generation from a query as an intermediate step before the retrieval, thus presenting ̲diffusion-enhanced generative ̲retrieval (DiffusionRet), which consists of two processing steps: 1) the diffusion-based document generation, which employs the sequence-to-sequence diffusion model to produce a pseudo document sample from a query, being expected to semantically close to a relevant document; 2) N-gram-based generative retrieval, which use another sequence-to-sequence model to generate n-grams that appear in the collection index for linking a generated sample to an original document. Experiment results on MS MARCO and Natural Questions dataset show that the proposed DiffusionRet significantly outperforms all the existing generative retrieval methods and leads to the state-of-the-art performances, even with much smaller number of parameters.

pdf bib abs

MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data
Sung-Min Lee | Eunhwan Park | Daeryong Seo | Donghyeon Jeon | Inho Kang | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EACL 2023

Transformer-based models for question answering (QA) over tables and texts confront a “long” hybrid sequence over tabular and textual elements, causing long-range reasoning problems. To handle long-range reasoning, we extensively employ a fusion-in-decoder (FiD) and exponential moving average (EMA), proposing a Moving Average Equipped Fusion-in-Decoder (MAFiD). With FiD as the backbone architecture, MAFiD combines various levels of reasoning: independent encoding of homogeneous data and single-row and multi-row heterogeneous reasoning, using a gated cross attention layer to effectively aggregate the three types of representations resulting from various reasonings. Experimental results on HybridQA indicate that MAFiD achieves state-of-the-art performance by increasing exact matching (EM) and F1 by 1.1 and 1.7, respectively, on the blind test set.

2022

pdf bib abs

Frustratingly Easy System Combination for Grammatical Error Correction
Muhammad Reza Qorib | Seung-Hoon Na | Hwee Tou Ng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we formulate system combination for grammatical error correction (GEC) as a simple machine learning task: binary classification. We demonstrate that with the right problem formulation, a simple logistic regression algorithm can be highly effective for combining GEC models. Our method successfully increases the F0.5 score from the highest base GEC system by 4.2 points on the CoNLL-2014 test set and 7.2 points on the BEA-2019 test set. Furthermore, our method outperforms the state of the art by 4.0 points on the BEA-2019 test set, 1.2 points on the CoNLL-2014 test set with original annotation, and 3.4 points on the CoNLL-2014 test set with alternative annotation. We also show that our system combination generates better corrections with higher F0.5 scores than the conventional ensemble.

pdf bib abs

LM-BFF-MS: Improving Few-Shot Fine-tuning of Language Models based on Multiple Soft Demonstration Memory
Eunhwan Park | Donghyeon Jeon | Seonhoon Kim | Inho Kang | Seung-Hoon Na
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LM-BFF (CITATION) achieves significant few-shot performance by using auto-generated prompts and adding demonstrations similar to an input example. To improve the approach of LM-BFF, this paper proposes LM-BFF-MS—better few-shot fine-tuning of language models with multiple soft demonstrations by making its further extensions, which include 1) prompts with multiple demonstrations based on automatic generation of multiple label words; and 2) soft demonstration memory which consists of multiple sequences of globally shared word embeddings for a similar context. Experiments conducted on eight NLP tasks show that LM-BFF-MS leads to improvements over LM-BFF on five tasks, particularly achieving 94.0 and 90.4 on SST-2 and MRPC, respectively.

pdf bib abs

This study proposes Semantic-Infused SElective Graph Reasoning (SISER) for fact verification, which newly presents semantic-level graph reasoning and injects its reasoning-enhanced representation into other types of graph-based and sequence-based reasoning methods. SISER combines three reasoning types: 1) semantic-level graph reasoning, which uses a semantic graph from evidence sentences, whose nodes are elements of a triple – <Subject, Verb, Object>, 2) “semantic-infused” sentence-level “selective” graph reasoning, which combine semantic-level and sentence-level representations and perform graph reasoning in a selective manner using the node selection mechanism, and 3) sequence reasoning, which concatenates all evidence sentences and performs attention-based reasoning. Experiment results on a large-scale dataset for Fact Extraction and VERification (FEVER) show that SISER outperforms the previous graph-based approaches and achieves state-of-the-art performance.

pdf bib abs

JBNU-CCLab at SemEval-2022 Task 7: DeBERTa for Identifying Plausible Clarifications in Instructional Texts
Daewook Kang | Sung-Min Lee | Eunhwan Park | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this study, we examine the ability of contextualized representations of pretrained language model to distinguish whether sequences from instructional articles are plausible or implausible. Towards this end, we compare the BERT, RoBERTa, and DeBERTa models using simple classifiers based on the sentence representations of the [CLS] tokens and perform a detailed analysis by visualizing the representations of the [CLS] tokens of the models. In the experimental results of Subtask A: Multi-Class Classification, DeBERTa exhibits the best performance and produces a more distinguishable representation across different labels. Submitting an ensemble of 10 DeBERTa-based models, our final system achieves an accuracy of 61.4% and is ranked fifth out of models submitted by eight teams. Further in-depth results suggest that the abilities of pretrained language models for the plausibility detection task are more strongly affected by their model structures or attention designs than by their model sizes.

pdf bib abs

JBNU-CCLab at SemEval-2022 Task 12: Machine Reading Comprehension and Span Pair Classification for Linking Mathematical Symbols to Their Descriptions
Sung-Min Lee | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system in the SemEval-2022 Task 12: ‘linking mathematical symbols to their descriptions’, achieving first on the leaderboard for all the subtasks comprising named entity extraction (NER) and relation extraction (RE). Our system is a two-stage pipeline model based on SciBERT that detects symbols, descriptions, and their relationships in scientific documents. The system consists of 1) machine reading comprehension(MRC)-based NER model, where each entity type is represented as a question and its entity mention span is extracted as an answer using an MRC model, and 2) span pair classification for RE, where two entity mentions and their type markers are encoded into span representations that are then fed to a Softmax classifier. In addition, we deploy a rule-based symbol tokenizer to improve the detection of the exact boundary of symbol entities. Regularization and ensemble methods are further explored to improve the RE model.

2020

pdf bib abs

JBNU at SemEval-2020 Task 4: BERT and UniLM for Commonsense Validation and Explanation
Seung-Hoon Na | Jong-Hyeon Lee
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our contributions to the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) and includes the experimental results of the two Subtasks B and C of the SemEval-2020 Task 4. Our systems rely on pre-trained language models, i.e., BERT (including its variants) and UniLM, and rank 10th and 7th among 27 and 17 systems on Subtasks B and C, respectively. We analyze the commonsense ability of the existing pretrained language models by testing them on the SemEval-2020 Task 4 ComVE dataset, specifically for Subtasks B and C, the explanation subtasks with multi-choice and sentence generation, respectively.

pdf bib abs

JBNU at MRP 2020: AMR Parsing Using a Joint State Model for Graph-Sequence Iterative Inference
Seung-Hoon Na | Jinwoo Min
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

This paper describes the Jeonbuk National University (JBNU) system for the 2020 shared task on Cross-Framework Meaning Representation Parsing at the Conference on Computational Natural Language Learning. Among the five frameworks, we address only the abstract meaning representation framework and propose a joint state model for the graph-sequence iterative inference of (Cai and Lam, 2020) for a simplified graph-sequence inference. In our joint state model, we update only a single joint state vector during the graph-sequence inference process instead of keeping the dual state vectors, and all other components are exactly the same as in (Cai and Lam, 2020).

2019

pdf bib abs

QE BERT: Bilingual BERT Using Multi-task Learning for Neural Quality Estimation
Hyun Kim | Joon-Ho Lim | Hyun-Ki Kim | Seung-Hoon Na
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

For translation quality estimation at word and sentence levels, this paper presents a novel approach based on BERT that recently has achieved impressive results on various natural language processing tasks. Our proposed model is re-purposed BERT for the translation quality estimation and uses multi-task learning for the sentence-level task and word-level subtasks (i.e., source word, target word, and target gap). Experimental results on Quality Estimation shared task of WMT19 show that our systems show competitive results and provide significant improvements over the baseline.

pdf bib abs

JBNU at MRP 2019: Multi-level Biaffine Attention for Semantic Dependency Parsing
Seung-Hoon Na | Jinwoon Min | Kwanghyeon Park | Jong-Hun Shin | Young-Kil Kim
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes Jeonbuk National University (JBNU)’s system for the 2019 shared task on Cross-Framework Meaning Representation Parsing (MRP 2019) at the Conference on Computational Natural Language Learning. Of the five frameworks, we address only the DELPH-IN MRS Bi-Lexical Dependencies (DP), Prague Semantic Dependencies (PSD), and Universal Conceptual Cognitive Annotation (UCCA) frameworks. We propose a unified parsing model using biaffine attention (Dozat and Manning, 2017), consisting of 1) a BERT-BiLSTM encoder and 2) a biaffine attention decoder. First, the BERT-BiLSTM for sentence encoder uses BERT to compose a sentence’s wordpieces into word-level embeddings and subsequently applies BiLSTM to word-level representations. Second, the biaffine attention decoder determines the scores for an edge’s existence and its labels based on biaffine attention functions between roledependent representations. We also present multi-level biaffine attention models by combining all the role-dependent representations that appear at multiple intermediate layers.

2017

pdf bib

Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation
Hyun Kim | Jong-Hyeok Lee | Seung-Hoon Na
Proceedings of the Second Conference on Machine Translation

pdf bib abs

Concept Equalization to Guide Correct Training of Neural Machine Translation
Kangil Kim | Jong-Hun Shin | Seung-Hoon Na | SangKeun Jung
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural machine translation decoders are usually conditional language models to sequentially generate words for target sentences. This approach is limited to find the best word composition and requires help of explicit methods as beam search. To help learning correct compositional mechanisms in NMTs, we propose concept equalization using direct mapping distributed representations of source and target sentences. In a translation experiment from English to French, the concept equalization significantly improved translation quality by 3.00 BLEU points compared to a state-of-the-art NMT model.