Seung-Hoon Na


pdf bib
RADCoT: Retrieval-Augmented Distillation to Specialization Models for Generating Chain-of-Thoughts in Query Expansion
Sung-Min Lee | Eunhwan Park | DongHyeon Jeon | Inho Kang | Seung-Hoon Na
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have demonstrated superior performance to that of small language models (SLM) in information retrieval for various subtasks including dense retrieval, reranking, query expansion, and pseudo-document generation. However, the parameter sizes of LLMs are extremely large, making it expensive to operate LLMs stably for providing LLM-based retrieval services. Recently, retrieval-augmented language models have been widely employed to significantly reduce the parameter size by retrieving relevant knowledge from large-scale corpora and exploiting the resulting “in-context” knowledge as additional model input, thereby substantially reducing the burden of internalizing and retaining world knowledge in model parameters. Armed by the retrieval-augmented language models, we present a retrieval-augmented model specialization that distills the capability of LLMs to generate the chain-of-thoughts (CoT) for query expansion – that is, injects the LLM’s capability to generate CoT into a retrieval-augmented SLM – referred to as RADCoT. Experimental results on the MS-MARCO, TREC DL 19, 20 datasets show that RADCoT yields consistent improvements over distillation without retrieval, achieving comparable performance to that of the query expansion method using LLM-based CoTs. Our code is publicly available at


pdf bib
MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data
Sung-Min Lee | Eunhwan Park | Daeryong Seo | Donghyeon Jeon | Inho Kang | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EACL 2023

Transformer-based models for question answering (QA) over tables and texts confront a “long” hybrid sequence over tabular and textual elements, causing long-range reasoning problems. To handle long-range reasoning, we extensively employ a fusion-in-decoder (FiD) and exponential moving average (EMA), proposing a Moving Average Equipped Fusion-in-Decoder (MAFiD). With FiD as the backbone architecture, MAFiD combines various levels of reasoning: independent encoding of homogeneous data and single-row and multi-row heterogeneous reasoning, using a gated cross attention layer to effectively aggregate the three types of representations resulting from various reasonings. Experimental results on HybridQA indicate that MAFiD achieves state-of-the-art performance by increasing exact matching (EM) and F1 by 1.1 and 1.7, respectively, on the blind test set.

pdf bib
DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative retrieval, which maps from a query to its relevant document identifiers (docids), has recently emerged as a new information retrieval (IR) paradigm, however, having suffered from 1) the lack of the intermediate reasoning step, caused by the manner of merely using a query to perform the hierarchical classification, and 2) the pretrain-finetune discrepancy, which comes from the use of the artificial symbols of docids. To address these limitations, we propose the novel approach of using the document generation from a query as an intermediate step before the retrieval, thus presenting  ̲diffusion-enhanced generative  ̲retrieval (DiffusionRet), which consists of two processing steps: 1) the diffusion-based document generation, which employs the sequence-to-sequence diffusion model to produce a pseudo document sample from a query, being expected to semantically close to a relevant document; 2) N-gram-based generative retrieval, which use another sequence-to-sequence model to generate n-grams that appear in the collection index for linking a generated sample to an original document. Experiment results on MS MARCO and Natural Questions dataset show that the proposed DiffusionRet significantly outperforms all the existing generative retrieval methods and leads to the state-of-the-art performances, even with much smaller number of parameters.

pdf bib
ExplainMeetSum: A Dataset for Explainable Meeting Summarization Aligned with Human Intent
Hyun Kim | Minsoo Cho | Seung-Hoon Na
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To enhance the explainability of meeting summarization, we construct a new dataset called “ExplainMeetSum,” an augmented version of QMSum, by newly annotating evidence sentences that faithfully “explain” a summary. Using ExplainMeetSum, we propose a novel multiple extractor guided summarization, namely Multi-DYLE, which extensively generalizes DYLE to enable using a supervised extractor based on human-aligned extractive oracles. We further present an explainability-aware task, named “Explainable Evidence Extraction” (E3), which aims to automatically detect all evidence sentences that support a given summary. Experimental results on the QMSum dataset show that the proposed Multi-DYLE outperforms DYLE with gains of up to 3.13 in the ROUGE-1 score. We further present the initial results on the E3 task, under the settings using separate and joint evaluation metrics.


pdf bib
LM-BFF-MS: Improving Few-Shot Fine-tuning of Language Models based on Multiple Soft Demonstration Memory
Eunhwan Park | Donghyeon Jeon | Seonhoon Kim | Inho Kang | Seung-Hoon Na
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LM-BFF (CITATION) achieves significant few-shot performance by using auto-generated prompts and adding demonstrations similar to an input example. To improve the approach of LM-BFF, this paper proposes LM-BFF-MSbetter few-shot fine-tuning of language models with multiple soft demonstrations by making its further extensions, which include 1) prompts with multiple demonstrations based on automatic generation of multiple label words; and 2) soft demonstration memory which consists of multiple sequences of globally shared word embeddings for a similar context. Experiments conducted on eight NLP tasks show that LM-BFF-MS leads to improvements over LM-BFF on five tasks, particularly achieving 94.0 and 90.4 on SST-2 and MRPC, respectively.

pdf bib
JBNU-CCLab at SemEval-2022 Task 7: DeBERTa for Identifying Plausible Clarifications in Instructional Texts
Daewook Kang | Sung-Min Lee | Eunhwan Park | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this study, we examine the ability of contextualized representations of pretrained language model to distinguish whether sequences from instructional articles are plausible or implausible. Towards this end, we compare the BERT, RoBERTa, and DeBERTa models using simple classifiers based on the sentence representations of the [CLS] tokens and perform a detailed analysis by visualizing the representations of the [CLS] tokens of the models. In the experimental results of Subtask A: Multi-Class Classification, DeBERTa exhibits the best performance and produces a more distinguishable representation across different labels. Submitting an ensemble of 10 DeBERTa-based models, our final system achieves an accuracy of 61.4% and is ranked fifth out of models submitted by eight teams. Further in-depth results suggest that the abilities of pretrained language models for the plausibility detection task are more strongly affected by their model structures or attention designs than by their model sizes.

pdf bib
JBNU-CCLab at SemEval-2022 Task 12: Machine Reading Comprehension and Span Pair Classification for Linking Mathematical Symbols to Their Descriptions
Sung-Min Lee | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system in the SemEval-2022 Task 12: ‘linking mathematical symbols to their descriptions’, achieving first on the leaderboard for all the subtasks comprising named entity extraction (NER) and relation extraction (RE). Our system is a two-stage pipeline model based on SciBERT that detects symbols, descriptions, and their relationships in scientific documents. The system consists of 1) machine reading comprehension(MRC)-based NER model, where each entity type is represented as a question and its entity mention span is extracted as an answer using an MRC model, and 2) span pair classification for RE, where two entity mentions and their type markers are encoded into span representations that are then fed to a Softmax classifier. In addition, we deploy a rule-based symbol tokenizer to improve the detection of the exact boundary of symbol entities. Regularization and ensemble methods are further explored to improve the RE model.

pdf bib
Frustratingly Easy System Combination for Grammatical Error Correction
Muhammad Reza Qorib | Seung-Hoon Na | Hwee Tou Ng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we formulate system combination for grammatical error correction (GEC) as a simple machine learning task: binary classification. We demonstrate that with the right problem formulation, a simple logistic regression algorithm can be highly effective for combining GEC models. Our method successfully increases the F0.5 score from the highest base GEC system by 4.2 points on the CoNLL-2014 test set and 7.2 points on the BEA-2019 test set. Furthermore, our method outperforms the state of the art by 4.0 points on the BEA-2019 test set, 1.2 points on the CoNLL-2014 test set with original annotation, and 3.4 points on the CoNLL-2014 test set with alternative annotation. We also show that our system combination generates better corrections with higher F0.5 scores than the conventional ensemble.

pdf bib
Proceedings of the 29th International Conference on Computational Linguistics
Nicoletta Calzolari | Chu-Ren Huang | Hansaem Kim | James Pustejovsky | Leo Wanner | Key-Sun Choi | Pum-Mo Ryu | Hsin-Hsi Chen | Lucia Donatelli | Heng Ji | Sadao Kurohashi | Patrizia Paggio | Nianwen Xue | Seokhwan Kim | Younggyun Hahm | Zhong He | Tony Kyungil Lee | Enrico Santus | Francis Bond | Seung-Hoon Na
Proceedings of the 29th International Conference on Computational Linguistics

pdf bib
SISER: Semantic-Infused Selective Graph Reasoning for Fact Verification
Eunhwan Park | Jong-Hyeon Lee | DongHyeon Jeon | Seonhoon Kim | Inho Kang | Seung-Hoon Na
Proceedings of the 29th International Conference on Computational Linguistics

This study proposes Semantic-Infused SElective Graph Reasoning (SISER) for fact verification, which newly presents semantic-level graph reasoning and injects its reasoning-enhanced representation into other types of graph-based and sequence-based reasoning methods. SISER combines three reasoning types: 1) semantic-level graph reasoning, which uses a semantic graph from evidence sentences, whose nodes are elements of a triple – <Subject, Verb, Object>, 2) “semantic-infused” sentence-level “selective” graph reasoning, which combine semantic-level and sentence-level representations and perform graph reasoning in a selective manner using the node selection mechanism, and 3) sequence reasoning, which concatenates all evidence sentences and performs attention-based reasoning. Experiment results on a large-scale dataset for Fact Extraction and VERification (FEVER) show that SISER outperforms the previous graph-based approaches and achieves state-of-the-art performance.


pdf bib
JBNU at SemEval-2020 Task 4: BERT and UniLM for Commonsense Validation and Explanation
Seung-Hoon Na | Jong-Hyeon Lee
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our contributions to the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) and includes the experimental results of the two Subtasks B and C of the SemEval-2020 Task 4. Our systems rely on pre-trained language models, i.e., BERT (including its variants) and UniLM, and rank 10th and 7th among 27 and 17 systems on Subtasks B and C, respectively. We analyze the commonsense ability of the existing pretrained language models by testing them on the SemEval-2020 Task 4 ComVE dataset, specifically for Subtasks B and C, the explanation subtasks with multi-choice and sentence generation, respectively.

pdf bib
JBNU at MRP 2020: AMR Parsing Using a Joint State Model for Graph-Sequence Iterative Inference
Seung-Hoon Na | Jinwoo Min
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

This paper describes the Jeonbuk National University (JBNU) system for the 2020 shared task on Cross-Framework Meaning Representation Parsing at the Conference on Computational Natural Language Learning. Among the five frameworks, we address only the abstract meaning representation framework and propose a joint state model for the graph-sequence iterative inference of (Cai and Lam, 2020) for a simplified graph-sequence inference. In our joint state model, we update only a single joint state vector during the graph-sequence inference process instead of keeping the dual state vectors, and all other components are exactly the same as in (Cai and Lam, 2020).


pdf bib
QE BERT: Bilingual BERT Using Multi-task Learning for Neural Quality Estimation
Hyun Kim | Joon-Ho Lim | Hyun-Ki Kim | Seung-Hoon Na
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

For translation quality estimation at word and sentence levels, this paper presents a novel approach based on BERT that recently has achieved impressive results on various natural language processing tasks. Our proposed model is re-purposed BERT for the translation quality estimation and uses multi-task learning for the sentence-level task and word-level subtasks (i.e., source word, target word, and target gap). Experimental results on Quality Estimation shared task of WMT19 show that our systems show competitive results and provide significant improvements over the baseline.

pdf bib
JBNU at MRP 2019: Multi-level Biaffine Attention for Semantic Dependency Parsing
Seung-Hoon Na | Jinwoon Min | Kwanghyeon Park | Jong-Hun Shin | Young-Kil Kim
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes Jeonbuk National University (JBNU)’s system for the 2019 shared task on Cross-Framework Meaning Representation Parsing (MRP 2019) at the Conference on Computational Natural Language Learning. Of the five frameworks, we address only the DELPH-IN MRS Bi-Lexical Dependencies (DP), Prague Semantic Dependencies (PSD), and Universal Conceptual Cognitive Annotation (UCCA) frameworks. We propose a unified parsing model using biaffine attention (Dozat and Manning, 2017), consisting of 1) a BERT-BiLSTM encoder and 2) a biaffine attention decoder. First, the BERT-BiLSTM for sentence encoder uses BERT to compose a sentence’s wordpieces into word-level embeddings and subsequently applies BiLSTM to word-level representations. Second, the biaffine attention decoder determines the scores for an edge’s existence and its labels based on biaffine attention functions between roledependent representations. We also present multi-level biaffine attention models by combining all the role-dependent representations that appear at multiple intermediate layers.


pdf bib
Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation
Hyun Kim | Jong-Hyeok Lee | Seung-Hoon Na
Proceedings of the Second Conference on Machine Translation

pdf bib
Concept Equalization to Guide Correct Training of Neural Machine Translation
Kangil Kim | Jong-Hun Shin | Seung-Hoon Na | SangKeun Jung
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural machine translation decoders are usually conditional language models to sequentially generate words for target sentences. This approach is limited to find the best word composition and requires help of explicit methods as beam search. To help learning correct compositional mechanisms in NMTs, we propose concept equalization using direct mapping distributed representations of source and target sentences. In a translation experiment from English to French, the concept equalization significantly improved translation quality by 3.00 BLEU points compared to a state-of-the-art NMT model.


pdf bib
Patent translation as technical document translation: customizing a Chinese-Korean MT system to patent domain
Yun Jin | Oh-Woog Kwon | Seung-Hoon Na | Young-Gil Kim
Proceedings of the 5th Workshop on Patent Translation


pdf bib
Search Result Clustering Using Label Language Model
Yeha Lee | Seung-Hoon Na | Jong-Hyeok Lee
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer
Chengguo Jin | Seung-Hoon Na | Dong-Il Kim | Jong-Hyeok Lee
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing


pdf bib
Conceptual Schema Approach to Natural Language Database Access
In-Su Kang | Seung-Hoon Na | Jong-Hyeok Lee
Proceedings of the Australasian Language Technology Workshop 2003