Wonjin Yoon

Also published as: WonJin Yoon


2022

pdf bib
Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
Wonjin Yoon | Richard Jackson | Elliot Ford | Vladimir Poroshin | Jaewoo Kang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.

pdf bib
KU_ED at SocialDisNER: Extracting Disease Mentions in Tweets Written in Spanish
Antoine Lain | Wonjin Yoon | Hyunjae Kim | Jaewoo Kang | Ian Simpson
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes our system developed for the Social Media Mining for Health (SMM4H) 2022 SocialDisNER task. We used several types of pre-trained language models, which are trained on Spanish biomedical literature or Spanish Tweets. We showed the difference in performance depending on the quality of the tokenization as well as introducing silver standard annotations when training the model. Our model obtained a strict F1 of 80.3% on the test set, which is an improvement of +12.8% F1 (24.6 std) over the average results across all submissions to the SocialDisNER challenge.

2020

pdf bib
Answering Questions on COVID-19 in Real-Time
Jinhyuk Lee | Sean S. Yi | Minbyul Jeong | Mujeen Sung | WonJin Yoon | Yonghwa Choi | Miyoung Ko | Jaewoo Kang
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it. One reason why the fight is difficult is due to the lack of information and knowledge. In this work, we outline our effort to contribute to shrinking this knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time. Our system also leverages information retrieval (IR) approaches to provide entity-level answers that are complementary to QA models. Evaluation of covidAsk is carried out by using a manually created dataset called COVID-19 Questions which is based on information from various sources, including the CDC and the WHO. We hope our system will be able to aid researchers in their search for knowledge and information not only for COVID-19, but for future pandemics as well.