2024
pdf
bib
abs
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
|
Seonil Son
|
Jemin Park
|
Youngseok Kim
|
Hyungjong Noh
|
Yeonsoo Lee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation (NMT). Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, we introduce the “Align-to-Distill” (A2D) strategy, designed to address the feature mapping problem by adaptively aligning student attention heads with their teacher counterparts during training. The Attention Alignment Module (AAM) in A2D performs a dense head-by-head comparison between student and teacher attention heads across layers, turning the combinatorial mapping heuristics into a learning problem. Our experiments show the efficacy of A2D, demonstrating gains of up to +3.61 and +0.63 BLEU points for WMT-2022 De→Dsb and WMT-2014 En→De, respectively, compared to Transformer baselines.The code and data are available at https://github.com/ncsoft/Align-to-Distill.
2023
pdf
bib
abs
Persona Expansion with Commonsense Knowledge for Diverse and Consistent Response Generation
Donghyun Kim
|
Youbin Ahn
|
Wongyu Kim
|
Chanhee Lee
|
Kyungchan Lee
|
Kyong-Ho Lee
|
Jeonguk Kim
|
Donghoon Shin
|
Yeonsoo Lee
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Generating diverse and consistent responses is the ultimate goal of a persona-based dialogue. Although many studies have been conducted, the generated responses tend to be generic and bland due to the personas’ limited descriptiveness. Therefore, it is necessary to expand the given personas for more attractive responses. However, indiscriminate expansion of personas threaten the consistency of responses and therefore reduce the interlocutor’s interest in conversation. To alleviate this issue, we propose a consistent persona expansion framework that improves not only the diversity but also the consistency of persona-based responses. To do so, we define consistency criteria to avoid possible contradictions among personas as follows: 1) Intra-Consistency and 2) Inter-Consistency. Then, we construct a silver profile dataset to deliver the ability to conform with the consistency criteria to the expansion model. Finally, we propose a persona expansion model with an encoder-decoder structure, which considers the relatedness and consistency among personas. Our experiments on the Persona-Chat dataset demonstrate the superiority of the proposed framework.
pdf
bib
abs
Concept-based Persona Expansion for Improving Diversity of Persona-Grounded Dialogue
Donghyun Kim
|
Youbin Ahn
|
Chanhee Lee
|
Wongyu Kim
|
Kyong-Ho Lee
|
Donghoon Shin
|
Yeonsoo Lee
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
A persona-grounded dialogue model aims to improve the quality of responses to promote user engagement. However, because the given personas are mostly short and limited to only a few informative words, it is challenging to utilize them to generate diverse responses. To tackle this problem, we propose a novel persona expansion framework, Concept-based Persona eXpansion (CPX). CPX takes the original persona as input and generates expanded personas that contain conceptually rich content. We constitute CPX with two task modules: 1) Concept Extractor and 2) Sentence Generator. To train these modules, we exploit the duality of two tasks with a commonsense dataset consisting of a concept set and the corresponding sentences which contain the given concepts. Extensive experiments on persona expansion and response generation show that our work sufficiently contributes to improving the quality of responses in diversity and richness.
pdf
bib
abs
VARCO-MT: NCSOFT’s WMT’23 Terminology Shared Task Submission
Geon Woo Park
|
Junghwa Lee
|
Meiying Ren
|
Allison Shindell
|
Yeonsoo Lee
Proceedings of the Eighth Conference on Machine Translation
A lack of consistency in terminology translation undermines quality of translation from even the best performing neural machine translation (NMT) models, especially in narrow domains like literature, medicine, and video game jargon. Dictionaries containing terminologies and their translations are often used to improve consistency but are difficult to construct and incorporate. We accompany our submissions to the WMT ‘23 Terminology Shared Task with a description of our experimental setup and procedure where we propose a framework of terminology-aware machine translation. Our framework comprises of an automatic terminology extraction process that constructs terminology-aware machine translation data in low-supervision settings and two model architectures with terminology constraints. Our models outperform baseline models by 21.51%p and 19.36%p in terminology recall respectively on the Chinese to English WMT’23 Terminology Shared Task test data.
2022
pdf
bib
abs
HaRiM+: Evaluating Summary Quality with Hallucination Risk
Seonil (Simon) Son
|
Junsoo Park
|
Jeong-in Hwang
|
Junghwa Lee
|
Hyungjong Noh
|
Yeonsoo Lee
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
One of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.
pdf
bib
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
Heuiseok Lim
|
Seungryong Kim
|
Yeonsoo Lee
|
Steve Lin
|
Paul Hongsuck Seo
|
Yumin Suh
|
Yoonna Jang
|
Jungwoo Lim
|
Yuna Hur
|
Suhyune Son
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
2018
pdf
bib
Two-Step Training and Mixed Encoding-Decoding for Implementing a Generative Chatbot with a Small Dialogue Corpus
Jintae Kim
|
Hyeon-Gu Lee
|
Harksoo Kim
|
Yeonsoo Lee
|
Young-Gil Kim
Proceedings of the Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG)