Zae Myung Kim


2022

pdf bib
Understanding Iterative Revision from Human-Written Text
Wanyu Du | Vipul Raheja | Dhruv Kumar | Zae Myung Kim | Melissa Lopez | Dongyeop Kang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Writing is, by nature, a strategic, adaptive, and, more importantly, an iterative process. A crucial part of writing is editing and revising the text. Previous works on text revision have focused on defining edit intention taxonomies within a single domain or developing computational models with a single level of edit granularity, such as sentence-level edits, which differ from human’s revision cycles. This work describes IteraTeR: the first large-scale, multi-domain, edit-intention annotated corpus of iteratively revised text. In particular, IteraTeR is collected based on a new framework to comprehensively model the iterative text revisions that generalizes to a variety of domains, edit intentions, revision depths, and granularities. When we incorporate our annotated edit intentions, both generative and action-based text revision models significantly improve automatic evaluations. Through our work, we better understand the text revision process, making vital connections between edit intentions and writing quality, enabling the creation of diverse corpora to support computational modeling of iterative text revisions.

pdf bib
Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision
Wanyu Du | Zae Myung Kim | Vipul Raheja | Dhruv Kumar | Dongyeop Kang
Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022)

Revision is an essential part of the human writing process. It tends to be strategic, adaptive, and, more importantly, iterative in nature. Despite the success of large language models on text revision tasks, they are limited to non-iterative, one-shot revisions. Examining and evaluating the capability of large language models for making continuous revisions and collaborating with human writers is a critical step towards building effective writing assistants. In this work, we present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3), which aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions. In R3, a text revision model provides text editing suggestions for human writers, who can accept or reject the suggested edits. The accepted edits are then incorporated into the model for the next iteration of document revision. Writers can therefore revise documents iteratively by interacting with the system and simply accepting/rejecting its suggested edits until the text revision model stops making further revisions or reaches a predefined maximum number of revisions. Empirical experiments show that R3 can generate revisions with comparable acceptance rate to human writers at early revision depths, and the human-machine interaction can get higher quality revisions with fewer iterations and edits. The collected human-model interaction dataset and system code are available at https://github.com/vipulraheja/IteraTeR. Our system demonstration is available at https://youtu.be/lK08tIpEoaE.

2021

pdf bib
Visualizing Cross‐Lingual Discourse Relations in Multilingual TED Corpora
Zae Myung Kim | Vassilina Nikoulina | Dongyeop Kang | Didier Schwab | Laurent Besacier
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

This paper presents an interactive data dashboard that provides users with an overview of the preservation of discourse relations among 28 language pairs. We display a graph network depicting the cross-lingual discourse relations between a pair of languages for multilingual TED talks and provide a search function to look for sentences with specific keywords or relation types, facilitating ease of analysis on the cross-lingual discourse relations.

pdf bib
Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
Zae Myung Kim | Laurent Besacier | Vassilina Nikoulina | Didier Schwab
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
A Multilingual Neural Machine Translation Model for Biomedical Data
Alexandre Bérard | Zae Myung Kim | Vassilina Nikoulina | Eunjeong Lucy Park | Matthias Gallé
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.

pdf bib
PATQUEST: Papago Translation Quality Estimation
Yujin Baek | Zae Myung Kim | Jihyung Moon | Hyunjoong Kim | Eunjeong Park
Proceedings of the Fifth Conference on Machine Translation

This paper describes the system submitted by Papago team for the quality estimation task at WMT 2020. It proposes two key strategies for quality estimation: (1) task-specific pretraining scheme, and (2) task-specific data augmentation. The former focuses on devising learning signals for pretraining that are closely related to the downstream task. We also present data augmentation techniques that simulate the varying levels of errors that the downstream dataset may contain. Thus, our PATQUEST models are exposed to erroneous translations in both stages of task-specific pretraining and finetuning, effectively enhancing their generalization capability. Our submitted models achieve significant improvement over the baselines for Task 1 (Sentence-Level Direct Assessment; EN-DE only), and Task 3 (Document-Level Score).

2015

pdf bib
Temporal Information Extraction from Korean Texts
Young-Seob Jeong | Zae Myung Kim | Hyun-Woo Do | Chae-Gyun Lim | Ho-Jin Choi
Proceedings of the Nineteenth Conference on Computational Natural Language Learning