Dayeon Ki


2023

pdf bib
An Integrated Search System for Korea Weather Data
Jinkyung Jo | Dayeon Ki | Soyoung Yoon | Minjoon Seo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

We introduce WeatherSearch, an integrated search system deployed at the Korea Meteorological Administration (KMA). WeatherSearch enables users to retrieve all the relevant data for weather forecasting from a massive weather database with simple natural language queries. We carefully design and conduct multiple expert surveys and interviews for template creation and apply data augmentation techniques including template filling to collect 4 million data points with minimal human labors. We then finetune mT5 on the collected dataset and achieve an average MRR of 0.66 and an average Recall of 0.82. We also discuss weather-data-specific characteristics that should be taken into account for creating such a system. We hope our paper serves as a simple and effective guideline for those designing similar systems in other regions of the world.

pdf bib
Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints
Yujin Baek | Koanho Lee | Dayeon Ki | Cheonbok Park | Hyoung-Gyu Lee | Jaegul Choo
Findings of the Association for Computational Linguistics: ACL 2023

Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but understudied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constraints that are “homographs” or “unseen” during training. To this end, we first design a homograph disambiguation module to differentiate the meanings of homographs. Moreover, we propose PLUMCOT which integrates contextually rich information about unseen lexical constraints from pre-trained language models and strengthens a copy mechanism of the pointer network via direct supervision of a copying score. We also release HOLLY, an evaluation benchmark for assessing the ability of model to cope with “homographic” and “unseen” lexical constraints. Experiments on HOLLY and the previous test setup show the effectiveness of our method. The effects of PLUMCOT are shown to be remarkable in “unseen” constraints. Our dataset is available at https://github.com/papago-lab/HOLLY-benchmark.