Taeuk Kim - ACL Anthology

Taeuk Kim

2026

Think Just Enough: Leveraging Self-Assessed Confidence for Adaptive Reasoning in Language Models
Junyeob Kim | Sang-goo Lee | Taeuk Kim
Findings of the Association for Computational Linguistics: EACL 2026

Recent reinforcement learning (RL)-trained language models have demonstrated strong performance on complex reasoning tasks by producing long and detailed reasoning traces. However, despite these advancements, they often struggle with finding the right balance in reasoning length: some terminate prematurely before reaching a correct answer (underthinking), while others continue reasoning beyond necessity, leading to inefficiency or even degraded accuracy (overthinking).To address these challenges, we propose a method for optimizing reasoning length via self-assessed confidence. By prompting the model to evaluate its own confidence at intermediate reasoning steps, we enable dynamic stopping once sufficient reasoning is achieved.Experiments across multiple reasoning benchmarks show that our approach improves computational efficiency without compromising answer quality. Furthermore, we find that confidence estimates from RL-trained reasoning models are more reliable than those from standard LLMs, making it a valuable internal signal for controlling reasoning depth.

2025

When to Speak, When to Abstain: Contrastive Decoding with Abstention
Hyuhng Joon Kim | Youna Kim | Sang-goo Lee | Taeuk Kim
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i.e., parametric) and external (i.e., contextual) knowledge. While substantial efforts have been made to enhance the utilization of both forms of knowledge, situations in which models lack relevant information remain underexplored. To investigate this challenge, we first present a controlled testbed featuring four distinct knowledge access scenarios, including the aforementioned edge case, revealing that conventional LLM usage exhibits insufficient robustness in handling all instances. Addressing this limitation, we propose Contrastive Decoding with Abstention (CDA), a novel training-free decoding method that allows LLMs to generate responses when relevant knowledge is available and to abstain otherwise. CDA estimates the relevance of both knowledge sources for a given input, adaptively deciding which type of information to prioritize and which to exclude. Through extensive experiments, we demonstrate that CDA can effectively perform accurate generation and abstention simultaneously, enhancing reliability and preserving user trust.

Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents
Yejin Yoon | Yuri Son | Namyoung So | Minseo Kim | Minsoo Cho | Chanhee Park | Seungshin Lee | Taeuk Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Conversational agents have traditionally been developed for either task-oriented dialogue (TOD) or open-ended chitchat, with limited progress in unifying the two. Yet, real-world conversations naturally involve fluid transitions between these modes. To address this gap, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed for transition-aware dialogue modeling that incorporates structurally diverse and integrated mode flows. TACT supports both user- and agent-driven mode switches, enabling robust modeling of complex conversational dynamics.To evaluate an agent’s ability to initiate and recover from mode transitions, we propose two new metrics—Switch and Recovery.Models trained on TACT outperform baselines in both intent detection and mode transition handling. Moreover, applying Direct Preference Optimization (DPO) to TACT-trained models yields additionalgains, achieving 75.74% joint mode-intent accuracy and a 70.1% win rate against GPT-4o in human evaluation.These results demonstrate that pairing structurally diverse data with DPO enhances response quality and transition control, paving the way for more proactive and transition-aware conversational agents.

MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation
Jungyeon Lee | Lee Kangmin | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2025

Knowledge conflict often arises in retrieval-augmented generation (RAG) systems, where retrieved documents may be inconsistent with one another or contradict the model’s parametric knowledge.Existing benchmarks for investigating the phenomenon have notable limitations, including a narrow focus on the question answering setup, heavy reliance on entity substitution techniques, and a restricted range of conflict types. To address these issues, we propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts, while ensuring interpretability through the explicit relational structure of KGs.Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict: both open-source and proprietary models struggle with conflict detection—especially when multi-hop reasoning is required—and often fail to pinpoint the exact source of contradictions.Finally, we present in-depth analyses that serve as a foundation for improving LLMs in integrating diverse, sometimes even conflicting, information.

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
Seunghee Kim | Changhyeon Kim | Taeuk Kim
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels—Easy, Medium, and Hard—facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase.

Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
Hwiyeong Lee | Uiji Hwang | Hyelim Lim | Taeuk Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models often retain unintended content, prompting growing interest in knowledge unlearning.Recent approaches emphasize localized unlearning, restricting parameter updates to specific regions in an effort to remove target knowledge while preserving unrelated general knowledge. However, their effectiveness remains uncertain due to the lack of robust and thorough evaluation of the trade-off between the competing goals of unlearning.In this paper, we begin by revisiting existing localized unlearning approaches. We then conduct controlled experiments to rigorously evaluate whether local parameter updates causally contribute to unlearning.Our findings reveal that the set of parameters that must be modified for effective unlearning is not strictly determined, challenging the core assumption of localized unlearning that parameter locality is inherently indicative of effective knowledge removal.

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Jisu Kim | Youngwoo Shin | Uiji Hwang | Jihun Choi | Richeng Xuan | Taeuk Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions. While recent studies have leveraged large language models (LLMs) to handle idioms across various tasks, e.g., idiom-containing sentence generation and idiomatic machine translation, little is known about the underlying mechanisms of idiom processing in LLMs, particularly in multilingual settings. To this end, we introduce MIDAS, a new large-scale dataset of idioms in six languages, each paired with its corresponding meaning. Leveraging this resource, we conduct a comprehensive evaluation of LLMs’ idiom processing ability, identifying key factors that influence their performance. Our findings suggest that LLMs rely not only on memorization, but also adopt a hybrid approach that integrates contextual cues and reasoning, especially when processing compositional idioms. This implies that idiom understanding in LLMs emerges from an interplay between internal knowledge retrieval and reasoning-based inference.

ENGinius: A Bilingual LLM Optimized for Plant Construction Engineering
Wooseong Lee | Minseo Kim | Taeil Hur | Gyeong Hwan Jang | Woncheol Lee | Maro Na | Taeuk Kim
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Recent advances in large language models (LLMs) have drawn attention for their potential to automate and optimize processes across various sectors.However, the adoption of LLMs in the plant construction industry remains limited, mainly due to its highly specialized nature and the lack of resources for domain-specific training and evaluation.In this work, we propose ENGinius, the first LLM designed for plant construction engineering.We present procedures for data construction and model training, along with the first benchmarks tailored to this underrepresented domain.We show that ENGinius delivers optimized responses to plant engineers by leveraging enriched domain knowledge.We also demonstrate its practical impact and use cases, such as technical document processing and multilingual communication.

2024

Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts
Youna Kim | Hyuhng Joon Kim | Cheonbok Park | Choonghyun Park | Hyunsoo Cho | Junyeob Kim | Kang Min Yoo | Sang-goo Lee | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2024

When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge the gap between external knowledge and the LLMs’ parametric knowledge.Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLMs with contrastive decoding approaches.While these approaches could yield truthful responses when relevant context is provided, they are prone to vulnerabilities when faced with noisy contexts.We extend the scope of previous studies to encompass noisy contexts and propose adaptive contrastive decoding (ACD) to leverage contextual influence effectively.ACD demonstrates improvements in open-domain question answering tasks compared to baselines, especially in robustness by remaining undistracted by noisy contexts in retrieval-augmented generation.

Revisiting the Impact of Pursuing Modularity for Code Generation
Deokyeong Kang | KiJung Seo | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2024

Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.

Hyper-CL: Conditioning Sentence Representations with Hypernetworks
Young Yoo | Jii Cha | Changhyeon Kim | Taeuk Kim
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives.In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations.In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions.Evaluation on two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time.We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms.

Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Seonghoon Lim | Taejun Yun | Jinhyeon Kim | Jihun Choi | Taeuk Kim
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness.In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process.Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.

BlendX: Complex Multi-Intent Detection with Blended Patterns
Yejin Yoon | Jungyeon Lee | Kangsan Kim | Chanhee Park | Taeuk Kim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSNIPS have limitations in their formulation. To address these issues, we present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors, elevating both its complexity and diversity. For dataset construction, we utilize both rule-based heuristics as well as a generative tool—OpenAI’s ChatGPT—which is augmented with a similarity-driven strategy for utterance selection. To ensure the quality of the proposed datasets, we also introduce three novel metrics that assess the statistical properties of an utterance related to word count, conjunction use, and pronoun usage. Extensive experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets, highlighting the need to reexamine the current state of the MID field. The dataset is available at https://github.com/HYU-NLP/BlendX.

Aligning Language Models to Explicitly Handle Ambiguity
Hyuhng Joon Kim | Youna Kim | Cheonbok Park | Junyeob Kim | Choonghyun Park | Kang Min Yoo | Sang-goo Lee | Taeuk Kim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. The data and code are available at https://github.com/heyjoonkim/APA.

2023

X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity
Taejun Yun | Jinhyeon Kim | Deokyeong Kang | Seonghoon Lim | Jihoon Kim | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2023

Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process. While English, due to its widespread usage, is typically regarded as the primary language for model adaption in various tasks, recent studies have revealed that the efficacy of XLT can be amplified by selecting the most appropriate source languages based on specific conditions. In this work, we propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT. Our approach is model-oriented, better reflecting the inner workings of foundation models. In addition, it requires only a moderate amount of raw text from candidate languages, distinguishing it from the majority of previous methods that rely on external resources. In experiments, we demonstrate that our method is more effective than baselines across diverse tasks. Specifically, it shows proficiency in ranking candidates for zero-shot XLT, achieving an improvement of 4.6% on average in terms of NDCG@3. We also provide extensive analyses that confirm the utility of sub-networks for XLT prediction.

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP
Hyuhng Kim | Hyunsoo Cho | Sang-Woo Lee | Junyeob Kim | Choonghyun Park | Sang-goo Lee | Kang Yoo | Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2023

When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model’s generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model’s performance.

2022

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Kang Min Yoo | Junyeob Kim | Hyuhng Joon Kim | Hyunsoo Cho | Hwiyeol Jo | Sang-Woo Lee | Sang-goo Lee | Taeuk Kim
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive.Intuitively, ground-truth labels should have as much impact in in-context learning (ICL) as supervised learning, but recent work reported that the input-label correspondence is significantly less important than previously thought.Intrigued by this counter-intuitive observation, we re-examine the importance of ground-truth labels in in-context learning.With the introduction of two novel metrics, namely Label-Correctness Sensitivity and Ground-truth Label Effect Ratio (GLER), we were able to conduct quantifiable analysis on the impact of ground-truth label demonstrations.Through extensive analyses, we find that the correct input-label mappings can have varying impacts on the downstream in-context learning performances, depending on the experimental configuration.Through additional studies, we identify key components, such as the verbosity of prompt templates and the language model size, as the controlling factor to achieve more noise-resilient ICL.

HYU at SemEval-2022 Task 2: Effective Idiomaticity Detection with Consideration at Different Levels of Contextualization
Youngju Joung | Taeuk Kim
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We propose a unified framework that enables us to consider various aspects of contextualization at different levels to better identify the idiomaticity of multi-word expressions. Through extensive experiments, we demonstrate that our approach based on the inter- and inner-sentence context of a target MWE is effective in improving the performance of related models. We also share our experience in detail on the task of SemEval-2022 Tasks 2 such that future work on the same task can be benefited from this.

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble
Hyunsoo Cho | Choonghyun Park | Jaewook Kang | Kang Min Yoo | Taeuk Kim | Sang-goo Lee
Findings of the Association for Computational Linguistics: EMNLP 2022

Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience.Most recent studies in OOD detection utilize the information from a single representation that resides in the penultimate layer to determine whether the input is anomalous or not.Although such a method is straightforward, the potential of diverse information in the intermediate layers is overlooked.In this paper, we propose a novel framework based on contrastive learning that encourages intermediate features to learn layer-specialized representations and assembles them implicitly into a single representation to absorb rich information in the pre-trained language model. Extensive experiments in various intent classification and OOD datasets demonstrate that our approach is significantly more effective than other works.

Revisiting the Practical Effectiveness of Constituency Parse Extraction from Pre-trained Language Models
Taeuk Kim
Proceedings of the 29th International Conference on Computational Linguistics

Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent paradigm that attempts to induce constituency parse trees relying only on the internal knowledge of pre-trained language models. While attractive in the perspective that similar to in-context learning, it does not require task-specific fine-tuning, the practical effectiveness of such an approach still remains unclear, except that it can function as a probe for investigating language models’ inner workings. In this work, we mathematically reformulate CPE-PLM and propose two advanced ensemble methods tailored for it, demonstrating that the new parsing paradigm can be competitive with common unsupervised parsers by introducing a set of heterogeneous PLMs combined using our techniques. Furthermore, we explore some scenarios where the trees generated by CPE-PLM are practically useful. Specifically, we show that CPE-PLM is more effective than typical supervised parsers in few-shot settings.

2021

Multilingual Chart-based Constituency Parse Extraction from Pre-trained Language Models
Taeuk Kim | Bowen Li | Sang-goo Lee
Findings of the Association for Computational Linguistics: EMNLP 2021

As it has been unveiled that pre-trained language models (PLMs) are to some extent capable of recognizing syntactic concepts in natural language, much effort has been made to develop a method for extracting complete (binary) parses from PLMs without training separate parsers. We improve upon this paradigm by proposing a novel chart-based method and an effective top-K ensemble technique. Moreover, we demonstrate that we can broaden the scope of application of the approach into multilingual settings. Specifically, we show that by applying our method on multilingual PLMs, it becomes possible to induce non-trivial parses for sentences from nine languages in an integrated and language-agnostic manner, attaining performance superior or comparable to that of unsupervised PCFGs. We also verify that our approach is robust to cross-lingual transfer. Finally, we provide analyses on the inner workings of our method. For instance, we discover universal attention heads which are consistently sensitive to syntactic information irrespective of the input language.

Self-Guided Contrastive Learning for BERT Sentence Representations
Taeuk Kim | Kang Min Yoo | Sang-goo Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Although BERT and its variants have reshaped the NLP landscape, it still remains unclear how best to derive sentence embeddings from such pre-trained Transformers. In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations. Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [CLS] token embeddings to function as sentence vectors. Moreover, we redesign the contrastive learning objective (NT-Xent) and apply it to sentence representation learning. We demonstrate with extensive experiments that our approach is more effective than competitive baselines on diverse sentence-related tasks. We also show it is efficient at inference and robust to domain shifts.

2020

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?
Jaeyoul Shin | Taeuk Kim | Sang-goo Lee
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs). With extensive experiments and analyses, we show that 1) our zero-shot approach is superior to a reasonable baseline that adopts TF-IDF and that 2) there exist several attention heads in PLMs specialized for emphasis selection, confirming that PLMs are capable of recognizing important words in sentences.

Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Bowen Li | Taeuk Kim | Reinald Kim Amplayo | Frank Keller
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.

2019

Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja
Kang Min Yoo | Taeuk Kim | Sang-goo Lee
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i.e. Hanja). We employ cross-lingual transfer learning in training word representations by leveraging the fact that Hanja is closely related to Chinese. We evaluate the intrinsic quality of representations learned through our approach using the word analogy and similarity tests. In addition, we demonstrate their effectiveness on several downstream tasks, including a novel Korean news headline generation task.

Summary Level Training of Sentence Rewriting for Abstractive Summarization
Sanghwan Bae | Taeuk Kim | Jihoon Kim | Sang-goo Lee
Proceedings of the 2nd Workshop on New Frontiers in Summarization

As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary. However, the existing models in this framework mostly rely on sentence-level rewards or suboptimal labels, causing a mismatch between a training objective and evaluation metric. In this paper, we present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning. In addition, we incorporate BERT into our model, making good use of its ability on natural language understanding. In extensive experiments, we show that a combination of our proposed model and training procedure obtains new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets. We also demonstrate that it generalizes better on DUC-2002 test set.

A Cross-Sentence Latent Variable Model for Semi-Supervised Text Sequence Matching
Jihun Choi | Taeuk Kim | Sang-goo Lee
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present a latent variable model for predicting the relationship between a pair of text sequences. Unlike previous auto-encoding–based approaches that consider each sequence separately, our proposed framework utilizes both sequences within a single model by generating a sequence that has a given relationship with a source sequence. We further extend the cross-sentence generating framework to facilitate semi-supervised training. We also define novel semantic constraints that lead the decoder network to generate semantically plausible and diverse sequences. We demonstrate the effectiveness of the proposed model from quantitative and qualitative experiments, while achieving state-of-the-art results on semi-supervised natural language inference and paraphrase identification.

2018

Element-wise Bilinear Interaction for Sentence Matching
Jihun Choi | Taeuk Kim | Sang-goo Lee
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

When we build a neural network model predicting the relationship between two sentences, the most general and intuitive approach is to use a Siamese architecture, where the sentence vectors obtained from a shared encoder is given as input to a classifier. For the classifier to work effectively, it is important to extract appropriate features from the two vectors and feed them as input. There exist several previous works that suggest heuristic-based function for matching sentence vectors, however it cannot be said that the heuristics tailored for a specific task generalize to other tasks. In this work, we propose a new matching function, ElBiS, that learns to model element-wise interaction between two vectors. From experiments, we empirically demonstrate that the proposed ElBiS matching function outperforms the concatenation-based or heuristic-based matching functions on natural language inference and paraphrase identification, while maintaining the fused representation compact.

SNU_IDS at SemEval-2018 Task 12: Sentence Encoder with Contextualized Vectors for Argument Reasoning Comprehension
Taeuk Kim | Jihun Choi | Sang-goo Lee
Proceedings of the 12th International Workshop on Semantic Evaluation

We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018. It is a simple neural network consisting of three parts, collectively judging whether the logic built on a set of given sentences (a claim, reason, and warrant) is plausible or not. The model utilizes contextualized word vectors pre-trained on large machine translation (MT) datasets as a form of transfer learning, which can help to mitigate the lack of training data. Quantitative analysis shows that simply leveraging LSTMs trained on MT datasets outperforms several baselines and non-transferred models, achieving accuracies of about 70% on the development set and about 60% on the test set.

2017

A Syllable-based Technique for Word Embeddings of Korean Words
Sanghyuk Choi | Taeuk Kim | Jinseok Seol | Sang-goo Lee
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in which word representation is composed of trained syllable vectors. Our model successfully produces morphologically meaningful representation of Korean words compared to the original Skip-gram embeddings. The results also show that it is quite robust to the Out-of-Vocabulary problem.

Co-authors

Choonghyun Park 4

Deokyeong Kang 2

Changhyeon Kim 2

Seonghoon Lim 2

Cheonbok Park 2

Reinald Kim Amplayo 1

Sanghyuk Choi 1

Gyeong Hwan Jang 1

Youngju Joung 1

Seungshin Lee 1

Youngwoo Shin 1

Venues