2024
pdf
bib
abs
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
Seonmin Koo
|
Jinsung Kim
|
YoungJoon Jang
|
Chanjun Park
|
Heuiseok Lim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
As the utilization of Large Language Models (LLMs) becomes more widespread, there is a growing demand for their ability to handle more complex and longer external knowledge across various use cases. Most existing evaluations of the open-ended question answering (ODQA) task, which necessitates the use of external knowledge, focus solely on whether the model provides the correct answer. However, even when LLMs answer correctly, they often fail to provide an obvious source for their responses. Therefore, it is necessary to jointly evaluate and verify the correctness of the answers and the appropriateness of grounded evidence in complex external contexts. To address this issue, we examine the phenomenon of discrepancies in abilities across two distinct tasks—QA and evidence selection—when performed simultaneously, from the perspective of task alignment. To verify LLMs’ task alignment, we introduce a verification framework and resources considering both semantic relevancy and structural diversity of the given long context knowledge. Through extensive experiments and detailed analysis, we provide insights into the task misalignment between QA and evidence selection. Our code and resources will be available upon acceptance.
pdf
bib
abs
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim
|
Wonho Song
|
Dahyun Kim
|
Yunsu Kim
|
Yungi Kim
|
Chanjun Park
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework. Evalverse enables individuals with limited knowledge of artificial intelligence to easily request LLM evaluations and receive detailed reports, facilitated by an integration with communication platforms like Slack. Thus, Evalverse serves as a powerful tool for the comprehensive assessment of LLMs, offering both researchers and practitioners a centralized and easily accessible evaluation framework. Finally, we also provide a demo video for Evalverse, showcasing its capabilities and implementation in a two-minute format.
pdf
bib
abs
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim
|
Gyoungjin Gim
|
Yungi Kim
|
Jihoo Kim
|
Byungju Kim
|
Wonseok Lee
|
Chanjun Park
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.
pdf
bib
abs
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
Chanjun Park
|
Jaehyung Seo
|
Seolhwa Lee
|
Junyoung Son
|
Hyeonseok Moon
|
Sugyeong Eo
|
Chanhee Lee
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: EACL 2024
The recent advancements in the realm of Automatic Speech Recognition (ASR) post-processing have been primarily driven by sequence-to-sequence paradigms. Despite their effectiveness, these methods often demand substantial amounts of data, necessitating the expensive recruitment of phonetic transcription experts to rectify the erroneous outputs of ASR systems, thereby creating the desired training data. Back TranScription (BTS) alleviates this issue by generating ASR inputs from clean text via a Text-to-Speech (TTS) system. While initial studies on BTS exhibited promise, they were constrained by a limited dataset of just 200,000 sentence pairs, leaving the scalability of this method in question. In this study, we delve into the potential scalability of BTS. We introduce the “Hyper-BTS” dataset, a corpus approximately five times larger than that utilized in prior research. Additionally, we present innovative criteria for categorizing error types within ASR post-processing. This not only facilitates a more comprehensive qualitative analysis, which was absent in preceding studies, but also enhances the understanding of ASR error patterns. Our empirical results, both quantitative and qualitative, suggest that the enlarged scale of the Hyper-BTS dataset sufficiently addresses a vast majority of the ASR error categories. We make the Hyper-BTS dataset publicly available.
pdf
bib
abs
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
Hyeonseok Moon
|
Jaewook Lee
|
Sugyeong Eo
|
Chanjun Park
|
Jaehyung Seo
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: EACL 2024
Educational question-answer generation has been extensively researched owing to its practical applicability. However, we have identified a persistent challenge concerning the evaluation of such systems. Existing evaluation methods often fail to produce objective results and instead exhibit a bias towards favoring high similarity to the ground-truth question-answer pairs. In this study, we demonstrate that these evaluation methods yield low human alignment and propose an alternative approach called Generative Interpretation (GI) to achieve more objective evaluations. Through experimental analysis, we reveal that GI outperforms existing evaluation methods in terms of human alignment, and even shows comparable performance with GPT3.5, only with BART-large.
pdf
bib
abs
Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation
Jungseob Lee
|
Hyeonseok Moon
|
Seungjun Lee
|
Chanjun Park
|
Sugyeong Eo
|
Hyunwoong Ko
|
Jaehyung Seo
|
Seungyoon Lee
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2024
Byte Pair Encoding is an effective approach in machine translation across several languages. However, our analysis indicates that BPE is prone to over-segmentation in the morphologically rich language, Korean, which can erode word semantics and lead to semantic confusion during training. This semantic confusion, stemming from over-segmentation, ultimately contributes to a degradation of overall translation quality. To address this issue, we introduce Length-aware Subword Vocabulary Construction (LeVoC), a novel approach strategically incorporating longer words into the vocabulary. By utilizing an external monolingual Korean corpus, LeVoC extracts and integrates long words, effectively preserving morphological information and reducing semantic confusion. Our experiments demonstrate that LeVoC not only significantly outperforms BPE, but also can be applied to and surpass current state-of-the-art morpheme-aware subword tokenization methods. We provide evidence that the difficulty in translating sentences with long words in Korean is associated with morphological compositionality, and LeVoC’s ability to reduce semantic confusion during training leads to improved translation quality.
pdf
bib
abs
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
Jaehyung Seo
|
Jaewook Lee
|
Chanjun Park
|
SeongTae Hong
|
Seungjun Lee
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2024
The evolution of large language models (LLMs) has culminated in a multitask model paradigm where prompts drive the generation of user-specific outputs. However, this advancement has revealed a critical challenge: LLMs frequently produce outputs against socially acceptable commonsense standards in various scenarios. To address this gap in commonsense reasoning, we present KoCommonGEN v2, a fine-grained benchmark dataset focused on Korean commonsense reasoning. This dataset, enriched with human annotations, comprises multiple-choice questions across seven error categories. These categories include commonsense memorization, numerical commonsense, toxic speech, and more, which are vulnerable to undermining the reliability of LLMs’ commonsense reasoning capabilities. The empirical results present that LLMs struggle with Korean commonsense reasoning. With human accuracy benchmarked at approximately 85%, GPT-4’s performance lags at about 74%, and other LLMs demonstrate an average accuracy of around 42%. Our findings emphasize the need for targeted improvements in Korean commonsense reasoning within LLMs, paving the way for more socially and contextually sensitive AI models.
pdf
bib
abs
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models
Seonmin Koo
|
Jinsung Kim
|
Chanjun Park
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2024
Grammatical error correction (GEC) system is a practical task used in the real world, showing high achievements alongside the development of large language models (LLMs). However, these achievements have been primarily obtained in English, and there is a relative lack of performance for non-English data, such as Korean. We hypothesize that this insufficiency occurs because relying solely on the parametric knowledge of LLMs makes it difficult to thoroughly understand the given context in the Korean GEC. Therefore, we propose a Knowledge-Augmented GEC (KAGEC) framework that incorporates evidential information from external sources into the prompt for the GEC task. KAGEC first extracts salient phrases from the given source and retrieves non-parametric knowledge based on these phrases, aiming to enhance the context-aware generation capabilities of LLMs. Furthermore, we conduct validations for fine-grained error types to identify those requiring a retrieval-augmented manner when LLMs perform Korean GEC. According to experimental results, most LLMs, including ChatGPT, demonstrate significant performance improvements when applying KAGEC.
pdf
bib
abs
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Hyeonseok Moon
|
Seungyoon Lee
|
SeongTae Hong
|
Seungjun Lee
|
Chanjun Park
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2024
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation. in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.
pdf
bib
Proceedings of the Eighth Widening NLP Workshop
Atnafu Lambebo Tonja
|
Alfredo Gomez
|
Chanjun Park
|
Hellina Hailu Nigatu
|
Santosh T.Y.S.S
|
Tanvi Anand
|
Wiem Ben Rim
Proceedings of the Eighth Widening NLP Workshop
pdf
bib
abs
Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation
Dahyun Jung
|
Sugyeong Eo
|
Chanjun Park
|
Heuiseok Lim
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Critical error detection (CED) in machine translation is a task that aims to detect errors that significantly distort the intended meaning. However, the existing study of CED lacks explainability due to the absence of content addressing the reasons for catastrophic errors. To address this limitation, we propose Explainable CED, a dataset that introduces the attributes of error explanation and correction regarding critical errors. Considering the advantage of reducing time costs and mitigating human annotation bias, we leverage a large language model in the data construction process. To improve the quality of the dataset and mitigate hallucination, we compare responses from the model and introduce an additional data filtering method through feedback scoring. The experiment demonstrates that the dataset appropriately reflects a consistent explanation and revision for errors, validating the reliability of the dataset.
pdf
bib
abs
Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
Seungyoon Lee
|
Dong Kim
|
Dahyun Jung
|
Chanjun Park
|
Heuiseok Lim
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Large Language Models (LLMs) have significantly impacted various fields requiring advanced linguistic understanding, yet concerns regarding their inherent biases and ethical considerations have also increased. Notably, LLMs have been critiqued for perpetuating stereotypes against diverse groups based on race, sexual orientation, and other attributes. However, most research analyzing these biases has predominantly focused on communities where English is the primary language, neglecting to consider the cultural and linguistic nuances of other societies. In this paper, we aim to explore the inherent biases and toxicity of LLMs, specifically within the social context of Korea. We devise a set of prompts that reflect major societal issues in Korea and assign varied personas to both ChatGPT and GPT-4 to assess the toxicity of the generated sentences. Our findings indicate that certain personas or prompt combinations consistently yield harmful content, highlighting the potential risks associated with specific persona-issue alignments within the Korean cultural framework. Furthermore, we discover that GPT-4 can produce more than twice the level of toxic content than ChatGPT under certain conditions.
pdf
bib
abs
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Sanghoon Kim
|
Dahyun Kim
|
Chanjun Park
|
Wonsung Lee
|
Wonho Song
|
Yunsu Kim
|
Hyeonwoo Kim
|
Yungi Kim
|
Hyeonju Lee
|
Jihoo Kim
|
Changbae Ahn
|
Seonghoon Yang
|
Sukyung Lee
|
Hyunbyung Park
|
Gyoungjin Gim
|
Mikyoung Cha
|
Hwalsuk Lee
|
Sunghun Kim
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.
pdf
bib
abs
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Chanjun Park
|
Hyeonwoo Kim
|
Dahyun Kim
|
SeongHwan Cho
|
Sanghoon Kim
|
Sukyung Lee
|
Yungi Kim
|
Hwalsuk Lee
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.
pdf
bib
abs
Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation
Sugyeong Eo
|
Jungwoo Lim
|
Chanjun Park
|
DaHyun Jung
|
Seonmin Koo
|
Hyeonseok Moon
|
Jaehyung Seo
|
Heuiseok Lim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Recent machine translation (MT) systems have overcome language barriers for a wide range of users, yet they still carry the risk of critical meaning deviation. Critical error detection (CED) is a task that identifies an inherent risk of catastrophic meaning distortions in the machine translation output. With the importance of reflecting cultural elements in detecting critical errors, we introduce the culture-aware “Politeness” type in detecting English-Korean critical translation errors. Besides, we facilitate two tasks by providing multiclass labels: critical error detection and critical error type classification (CETC). Empirical evaluations reveal that our introduced data augmentation approach using a newly presented perturber significantly outperforms existing baselines in both tasks. Further analysis highlights the significance of multiclass labeling by demonstrating its superior effectiveness compared to binary labels.
pdf
bib
abs
Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean
Seungyoon Lee
|
Chanjun Park
|
DaHyun Jung
|
Hyeonseok Moon
|
Jaehyung Seo
|
Sugyeong Eo
|
Heuiseok Lim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Counter-narrative generation, i.e., the generation of fact-based responses to hate speech with the aim of correcting discriminatory beliefs, has been demonstrated to be an effective method to combat hate speech. However, its effectiveness is limited by the resource-intensive nature of dataset construction processes and only focuses on the primary language. To alleviate this problem, we propose a Korean Hate Speech Counter Punch (KHSCP), a cost-effective counter-narrative generation method in the Korean language. To this end, we release the first counter-narrative generation dataset in Korean and pose two research questions. Under the questions, we propose an effective augmentation method and investigate the reasonability of a large language model to overcome data scarcity in low-resource environments by leveraging existing resources. In this regard, we conduct several experiments to verify the effectiveness of the proposed method. Our results reveal that applying pre-existing resources can improve the generation performance by a significant margin. Through deep analysis on these experiments, this work proposes the possibility of overcoming the challenges of generating counter-narratives in low-resource environments.
2023
pdf
bib
abs
PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
Seungjun Lee
|
Yoonna Jang
|
Chanjun Park
|
Jungseob Lee
|
Jaehyung Seo
|
Hyeonseok Moon
|
Sugyeong Eo
|
Seounghoon Lee
|
Bernardo Yahya
|
Heuiseok Lim
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
English is acknowledged worldwide as a mode of communication. However, due to the absence of realistic practicing scenarios, students learning English as a foreign language (EFL) typically have limited chances to converse and share feedback with others. In this paper, we propose PEEP-Talk, a real-world situational dialogue-based chatbot designed for English education. It also naturally switches to a new topic or situation in response to out-of-topic utterances, which are common among English beginners. Furthermore, PEEP-Talk provides feedback score on conversation and grammar error correction. We performed automatic and user evaluations to validate performance and education efficiency of our system. The results show that PEEP-Talk generates appropriate responses in various real-life situations while providing accurate feedback to learners. Moreover, we demonstrate a positive impact on English-speaking, grammar, and English learning anxiety, implying that PEEP-Talk can lower the barrier to learning natural conversation in effective ways.
pdf
bib
abs
Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering
Seungjun Lee
|
Hyeonseok Moon
|
Chanjun Park
|
Heuiseok Lim
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
In this paper, we present the KU x Upstage team’s submission for the Special Task on Formality Control on Spoken Language Translation, which involves translating English into four languages with diverse grammatical formality markers. Our methodology comprises two primary components: 1) a language-specific data-driven approach, and 2) the generation of synthetic data through the employment of large-scale language models and empirically-grounded prompt engineering. By adapting methodologies and models to accommodate the unique linguistic properties of each language, we observe a notable enhancement in performance relative to the baseline, substantiating the heightened efficacy of data-driven approaches. Moreover, our devised prompt engineering strategy yields superior synthetic translation instances.
pdf
bib
abs
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
Seonmin Koo
|
Chanjun Park
|
Jinsung Kim
|
Jaehyung Seo
|
Sugyeong Eo
|
Hyeonseok Moon
|
Heuiseok Lim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Automatic Speech Recognition (ASR) systems are instrumental across various applications, with their performance being critically tied to user satisfaction. Conventional evaluation metrics for ASR systems produce a singular aggregate score, which is insufficient for understanding specific system vulnerabilities. Therefore, we aim to address the limitations of the previous ASR evaluation methods by introducing the Korean Error Explainable Benchmark Dataset for ASR and Post-processing (KEBAP). KEBAP enables comprehensive analysis of ASR systems at both speech- and text levels, thereby facilitating a more balanced assessment encompassing speech recognition accuracy and user readability. KEBAP provides 37 newly defined speech-level resources incorporating diverse noise environments and speaker characteristics categories, also presenting 13 distinct text-level error types. This paper demonstrates detailed statistical analyses of colloquial noise categories and textual error types. Furthermore, we conduct extensive validation and analysis on commercially deployed ASR systems, providing valuable insights into their performance. As a more fine-grained and real-world-centric evaluation method, KEBAP contributes to identifying and mitigating potential weaknesses in ASR systems.
pdf
bib
abs
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
Jaehyung Seo
|
Hyeonseok Moon
|
Jaewook Lee
|
Sugyeong Eo
|
Chanjun Park
|
Heuiseok Lim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Korean morphological variations present unique opportunities and challenges in natural language processing (NLP), necessitating an advanced understanding of morpheme-based sentence construction. The complexity of morphological variations allows for diverse sentence forms based on the syntactic-semantic integration of functional morphemes (i.e., affixes) to lexical morphemes (i.e., roots). With this in mind, we propose a method - CHEF, replicating the morphological transformations inherent in sentences based on lexical and functional morpheme combinations through generative data augmentation. CHEF operates using a morpheme blender and a label discriminator, thereby enhancing the diversity of Korean sentence forms by capturing the properties of agglutination while maintaining label consistency. We conduct experiments on Korean multiple classification datasets, improving model performance in full- and few-shot settings. Our proposed method boosts performance beyond the preceding data augmentation methods without incurring external data usage. We demonstrate that our approach achieves comparable results yielded by augmentation techniques that use large language models (LLMs).
pdf
bib
Proceedings of the Seventh Widening NLP Workshop (WiNLP 2023)
Bonaventure F. P. Dossou
|
Isidora Tourni
|
Hatem Haddad
|
Shaily Bhatt
|
Fatemehsadat Mireshghallah
|
Sunipa Dev
|
Tanvi Anand
|
Weijia Xu
|
Atnafu Lambebo Tonja
|
Alfredo Gomez
|
Chanjun Park
Proceedings of the Seventh Widening NLP Workshop (WiNLP 2023)
pdf
bib
Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
DaHyun Jung
|
Sugyeong Eo
|
Chanjun Park
|
Hyeonseok Moon
|
Jaehyung Seo
|
Heuiseok Lim
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
2022
pdf
bib
abs
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
Jaehyung Seo
|
Seounghoon Lee
|
Chanjun Park
|
Yoonna Jang
|
Hyeonseok Moon
|
Sugyeong Eo
|
Seonmin Koo
|
Heuiseok Lim
Findings of the Association for Computational Linguistics: NAACL 2022
Recent natural language understanding (NLU) research on the Korean language has been vigorously maturing with the advancements of pretrained language models and datasets. However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i.e., generative commonsense reasoning). The two major challenges are inadequate data resources to develop generative commonsense reasoning regarding Korean linguistic features and to evaluate language models which are necessary for natural language generation (NLG). To solve these problems, we propose a text-generation dataset for Korean generative commonsense reasoning and language model evaluation. In this work, a semi-automatic dataset construction approach filters out contents inexplicable to commonsense, ascertains quality, and reduces the cost of building the dataset. We also present an in-depth analysis of the generation results of language models with various evaluation metrics along with human-annotated scores. The whole dataset is publicly available at (
https://aihub.or.kr/opendata/korea-university).
pdf
bib
abs
Priming Ancient Korean Neural Machine Translation
Chanjun Park
|
Seolhwa Lee
|
Jaehyung Seo
|
Hyeonseok Moon
|
Sugyeong Eo
|
Heuiseok Lim
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In recent years, there has been an increasing need for the restoration and translation of historical languages. In this study, we attempt to translate historical records in ancient Korean language based on neural machine translation (NMT). Inspired by priming, a cognitive science theory that two different stimuli influence each other, we propose novel priming ancient-Korean NMT (AKNMT) using bilingual subword embedding initialization with structural property awareness in the ancient documents. Finally, we obtain state-of-the-art results in the AKNMT task. To the best of our knowledge, we confirm the possibility of developing a human-centric model that incorporates the concepts of cognitive science and analyzes the result from the perspective of interference and cognitive dissonance theory for the first time.
pdf
bib
abs
Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing
Hyeonseok Moon
|
Chanjun Park
|
Seolhwa Lee
|
Jaehyung Seo
|
Jungseob Lee
|
Sugyeong Eo
|
Heuiseok Lim
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Automatic post-editing (APE) refers to a research field that aims to automatically correct errors included in the translation sentences derived by the machine translation system. This study has several limitations, considering the data acquisition, because there is no official dataset for most language pairs. Moreover, the amount of data is restricted even for language pairs in which official data has been released, such as WMT. To solve this problem and promote universal APE research regardless of APE data existence, this study proposes a method for automatically generating APE data based on a noising scheme from a parallel corpus. Particularly, we propose a human mimicking errors-based noising scheme that considers a practical correction process at the human level. We propose a precise inspection to attain high performance, and we derived the optimal noising schemes that show substantial effectiveness. Through these, we also demonstrate that depending on the type of noise, the noising scheme-based APE data generation may lead to inferior performance. In addition, we propose a dynamic noise injection strategy that enables the acquisition of a robust error correction capability and demonstrated its effectiveness by comparative analysis. This study enables obtaining a high performance APE model without human-generated data and can promote universal APE research for all language pairs targeting English.
pdf
bib
abs
FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
Chanjun Park
|
Yoonna Jang
|
Seolhwa Lee
|
Sungjin Park
|
Heuiseok Lim
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models. A persona-based dialogue system that is embedded in NAO provides an interesting and consistent multi-turn dialogue for users. Also, an grammar error correction system promotes improvement in grammar skills of the users. Thus, our system enables personalized learning based on persona dialogue and facilitates grammar learning of a user using grammar error feedback. Furthermore, we verified whether FreeTalky provides practical help in alleviating xenoglossophobia by replacing the real human in the conversation with a NAO robot, through human evaluation.
pdf
bib
abs
PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities
Chanjun Park
|
Yoonna Jang
|
Seolhwa Lee
|
Jaehyung Seo
|
Kisu Yang
|
Heuiseok Lim
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations
Children with language disabilities face communication difficulties in daily life. They are often deprived of the opportunity to participate in social activities due to their difficulty in understanding or using natural language. In this regard, Augmentative and Alternative Communication (AAC) can be a practical means of communication for children with language disabilities. In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities. PicTalky can process both text and pictograms more accurately by connecting a series of neural-based NLP modules. Additionally, we perform quantitative and qualitative analyses on the modules of PicTalky. By using this service, it is expected that those suffering from language problems will be able to express their intentions or desires more easily and improve their quality of life. We have made the models freely available alongside a demonstration of the web interface. Furthermore, we implemented robotics AAC for the first time by applying PicTalky to the NAO robot.
pdf
bib
abs
KU X Upstage’s Submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
Sugyeong Eo
|
Chanjun Park
|
Hyeonseok Moon
|
Jaehyung Seo
|
Heuiseok Lim
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents KU X Upstage’s submission to the quality estimation (QE): critical error detection (CED) shared task in WMT22. We leverage the XLM-RoBERTa large model without utilizing any additional parallel data. To the best of our knowledge, we apply prompt-based fine-tuning to the QE task for the first time. To maximize the model’s language understanding capability, we reformulate the CED task to be similar to the masked language model objective, which is a pre-training strategy of the language model. We design intuitive templates and label words, and include auxiliary descriptions such as demonstration or Google Translate results in the input sequence. We further improve the performance through the template ensemble, and as a result of the shared task, our approach achieve the best performance for both English-German and Portuguese-English language pairs in an unconstrained setting.
pdf
bib
abs
QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
Sugyeong Eo
|
Chanjun Park
|
Hyeonseok Moon
|
Jaehyung Seo
|
Gyeongmin Kim
|
Jungseob Lee
|
Heuiseok Lim
Proceedings of the 29th International Conference on Computational Linguistics
With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing. QE aims to automatically predict the quality of machine translation (MT) output without reference sentences. Despite its high utility in the real world, there remain several limitations concerning manual QE data creation: inevitably incurred non-trivial costs due to the need for translation experts, and issues with data scaling and language expansion. To tackle these limitations, we present QUAK, a Korean-English synthetic QE dataset generated in a fully automatic manner. This consists of three sub-QUAK datasets QUAK-M, QUAK-P, and QUAK-H, produced through three strategies that are relatively free from language constraints. Since each strategy requires no human effort, which facilitates scalability, we scale our data up to 1.58M for QUAK-P, H and 6.58M for QUAK-M. As an experiment, we quantitatively analyze word-level QE results in various ways while performing statistical analysis. Moreover, we show that datasets scaled in an efficient way also contribute to performance improvements by observing meaningful performance gains in QUAK-M, P when adding data up to 1.58M.
pdf
bib
abs
Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?
SeungYoon Lee
|
Jungseob Lee
|
Chanjun Park
|
Sugyeong Eo
|
Hyeonseok Moon
|
Jaehyung Seo
|
Jeongbae Park
|
Heuiseok Lim
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
Rather than continuing the conversation based on personalized or implicit information, the existing conversation system generates dialogue by focusing only on the superficial content. To solve this problem, FoCus was recently released. FoCus is a persona-knowledge grounded dialogue generation dataset that leverages Wikipedia’s knowledge and personal persona, focusing on the landmarks provided by Google, enabling user-centered conversation. However, a closer empirical study is needed since research in the field is still in its early stages. Therefore, we fling two research questions about FoCus. “Is the FoCus whether for conversation or question answering?” to identify the structural problems of the dataset. “Does the FoCus model do real knowledge blending?” to closely demonstrate that the model acquires actual knowledge. As a result of the experiment, we present that the FoCus model could not correctly blend the knowledge according to the input dialogue and that the dataset design is unsuitable for the multi-turn conversation.
2021
pdf
bib
abs
Dealing with the Paradox of Quality Estimation
Sugyeong Eo
|
Chanjun Park
|
Hyeonseok Moon
|
Jaehyung Seo
|
Heuiseok Lim
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.
pdf
bib
abs
Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
Chanjun Park
|
Sugyeong Eo
|
Hyeonseok Moon
|
Heuiseok Lim
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
Most of the recent Natural Language Processing(NLP) studies are based on the Pretrain-Finetuning Approach (PFA), but in small and medium-sized enterprises or companies with insufficient hardware there are many limitations to servicing NLP application software using such technology due to slow speed and insufficient memory. The latest PFA technologies require large amounts of data, especially for low-resource languages, making them much more difficult to work with. We propose a new tokenization method, ONE-Piece, to address this limitation that combines the morphology-considered subword tokenization method and the vocabulary method used after probing for an existing method that has not been carefully considered before. Our proposed method can also be used without modifying the model structure. We experiment by applying ONE-Piece to Korean, a morphologically-rich and low-resource language. We derive an optimal subword tokenization result for Korean-English machine translation by conducting a case study that combines the subword tokenization method, morphological segmentation, and vocabulary method. Through comparative experiments with all the tokenization methods currently used in NLP research, ONE-Piece achieves performance comparable to the current Korean-English machine translation state-of-the-art model.
pdf
bib
abs
Two Heads are Better than One? Verification of Ensemble Effect in Neural Machine Translation
Chanjun Park
|
Sungjin Park
|
Seolhwa Lee
|
Taesun Whang
|
Heuiseok Lim
Proceedings of the Second Workshop on Insights from Negative Results in NLP
In the field of natural language processing, ensembles are broadly known to be effective in improving performance. This paper analyzes how ensemble of neural machine translation (NMT) models affect performance improvement by designing various experimental setups (i.e., intra-, inter-ensemble, and non-convergence ensemble). To an in-depth examination, we analyze each ensemble method with respect to several aspects such as different attention models and vocab strategies. Experimental results show that ensembling is not always resulting in performance increases and give noteworthy negative findings.
pdf
bib
abs
BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
Chanjun Park
|
Jaehyung Seo
|
Seolhwa Lee
|
Chanhee Lee
|
Hyeonseok Moon
|
Sugyeong Eo
|
Heuiseok Lim
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
With the growing popularity of smart speakers, such as Amazon Alexa, speech is becoming one of the most important modes of human-computer interaction. Automatic speech recognition (ASR) is arguably the most critical component of such systems, as errors in speech recognition propagate to the downstream components and drastically degrade the user experience. A simple and effective way to improve the speech recognition accuracy is to apply automatic post-processor to the recognition result. However, training a post-processor requires parallel corpora created by human annotators, which are expensive and not scalable. To alleviate this problem, we propose Back TranScription (BTS), a denoising-based method that can create such corpora without human labor. Using a raw corpus, BTS corrupts the text using Text-to-Speech (TTS) and Speech-to-Text (STT) systems. Then, a post-processing model can be trained to reconstruct the original text given the corrupted input. Quantitative and qualitative evaluations show that a post-processor trained using our approach is highly effective in fixing non-trivial speech recognition errors such as mishandling foreign words. We present the generated parallel corpus and post-processing platform to make our results publicly available.