Chulho Kim


2024

pdf bib
How to use Language Models for Synthetic Text Generation in Cerebrovascular Disease-specific Medical Reports
Byoung-Doo Oh | Gi-Youn Kim | Chulho Kim | Yu-Seop Kim
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)

The quantity and quality of data have a significant impact on the performance of artificial intelligence (AI). However, in the biomedical domain, data often contains sensitive information such as personal details, making it challenging to secure enough data for medical AI. Consequently, there is a growing interest in synthetic data generation for medical AI. However, research has primarily focused on medical images, with little given to text-based data such as medical records. Therefore, this study explores the application of language models (LMs) for synthetic text generation in low-resource domains like medical records. It compares the results of synthetic text generation based on different LMs. To achieve this, we focused on two criteria for LM-based synthetic text generation of medical records using two keywords entered by the user: 1) the impact of the LM’s knowledge, 2) the impact of the LM’s size. Additionally, we objectively evaluated the generated synthetic text, including representative metrics such as BLUE and ROUGE, along with clinician’s evaluations.

2020

pdf bib
Various Approaches for Predicting Stroke Prognosis using Magnetic Resonance Imaging Text Records
Tak-Sung Heo | Chulho Kim | Jeong-Myeong Choi | Yeong-Seok Jeong | Yu-Seop Kim
Proceedings of the 3rd Clinical Natural Language Processing Workshop

Stroke is one of the leading causes of death and disability worldwide. Stroke is treatable, but it is prone to disability after treatment and must be prevented. To grasp the degree of disability caused by stroke, we use magnetic resonance imaging text records to predict stroke and measure the performance according to the document-level and sentence-level representation. As a result of the experiment, the document-level representation shows better performance.