2024
pdf
bib
abs
How to use Language Models for Synthetic Text Generation in Cerebrovascular Disease-specific Medical Reports
Byoung-Doo Oh
|
Gi-Youn Kim
|
Chulho Kim
|
Yu-Seop Kim
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)
The quantity and quality of data have a significant impact on the performance of artificial intelligence (AI). However, in the biomedical domain, data often contains sensitive information such as personal details, making it challenging to secure enough data for medical AI. Consequently, there is a growing interest in synthetic data generation for medical AI. However, research has primarily focused on medical images, with little given to text-based data such as medical records. Therefore, this study explores the application of language models (LMs) for synthetic text generation in low-resource domains like medical records. It compares the results of synthetic text generation based on different LMs. To achieve this, we focused on two criteria for LM-based synthetic text generation of medical records using two keywords entered by the user: 1) the impact of the LM’s knowledge, 2) the impact of the LM’s size. Additionally, we objectively evaluated the generated synthetic text, including representative metrics such as BLUE and ROUGE, along with clinician’s evaluations.
2022
pdf
bib
abs
Applicability of Pretrained Language Models: Automatic Screening for Children’s Language Development Level
Byoung-doo Oh
|
Yoon-koung Lee
|
Yu-seop Kim
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)
The various potential of children can be limited by language delay or language impairments. However, there are many instances where parents are unaware of the child’s condition and do not obtain appropriate treatment as a result. Additionally, experts collecting children’s utterance to establish norms of language tests and evaluating children’s language development level takes a significant amount of time and work. To address these issues, dependable automated screening tools are required. In this paper, we used pretrained LM to assist experts in quickly and objectively screening the language development level of children. Here, evaluating the language development level is to ensure that the child has the appropriate language abilities for his or her age, which is the same as the child’s age. To do this, we analyzed the utterances of children according to age. Based on these findings, we use the standard deviations of the pretrained LM’s probability as a score for children to screen their language development level. The experiment results showed very strong correlations between our proposed method and the Korean language test REVT (REVT-R, REVT-E), with Pearson correlation coefficient of 0.9888 and 0.9892, respectively.
2020
pdf
bib
abs
Various Approaches for Predicting Stroke Prognosis using Magnetic Resonance Imaging Text Records
Tak-Sung Heo
|
Chulho Kim
|
Jeong-Myeong Choi
|
Yeong-Seok Jeong
|
Yu-Seop Kim
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Stroke is one of the leading causes of death and disability worldwide. Stroke is treatable, but it is prone to disability after treatment and must be prevented. To grasp the degree of disability caused by stroke, we use magnetic resonance imaging text records to predict stroke and measure the performance according to the document-level and sentence-level representation. As a result of the experiment, the document-level representation shows better performance.
pdf
bib
abs
Lightweight Text Classifier using Sinusoidal Positional Encoding
Byoung-Doo Oh
|
Yu-Seop Kim
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Large and complex models have recently been developed that require many parameters and much time to solve various problems in natural language processing. This paper explores an efficient way to avoid models being too complicated and ensure nearly equal performance to models showing the state-of-the-art. We propose a single convolutional neural network (CNN) using the sinusoidal positional encoding (SPE) in text classification. The SPE provides useful position information of a word and can construct a more efficient model architecture than before in a CNN-based approach. Our model can significantly reduce the parameter size (at least 67%) and training time (up to 85%) while maintaining similar performance to the CNN-based approach on multiple benchmark datasets.
2017
pdf
bib
abs
Correlation Analysis of Chronic Obstructive Pulmonary Disease (COPD) and its Biomarkers Using the Word Embeddings
Byeong-Hun Yoon
|
Yu-Seop Kim
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
It is very costly and time consuming to find new biomarkers for specific diseases in clinical laboratories. In this study, to find new biomarkers most closely related to Chronic Obstructive Pulmonary Disease (COPD), which is widely known as respiratory disease, biomarkers known to be associated with respiratory diseases and COPD itself were converted into word embedding. And their similarities were measured. We used Word2Vec, Canonical Correlation Analysis (CCA), and Global Vector (GloVe) for word embedding. In order to replace the clinical evaluation, the titles and abstracts of papers retrieved from Google Scholars were analyzed and quantified to estimate the performance of the word em-bedding models.
2016
pdf
bib
Drop-out Conditional Random Fields for Twitter with Huge Mined Gazetteer
Eunsuk Yang
|
Young-Bum Kim
|
Ruhi Sarikaya
|
Yu-Seop Kim
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2015
pdf
bib
Hallym: Named Entity Recognition on Twitter with Word Representation
Eun-Suk Yang
|
Yu-Seop Kim
Proceedings of the Workshop on Noisy User-generated Text
2014
pdf
bib
Training a Korean SRL System with Rich Morphological Features
Young-Bum Kim
|
Heemoon Chae
|
Benjamin Snyder
|
Yu-Seop Kim
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2002
pdf
bib
A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation
Yu-Seop Kim
|
Jeong-Ho Chang
|
Byoung-Tak Zhang
COLING 2002: The 19th International Conference on Computational Linguistics
2000
pdf
bib
abs
Machine translation systems: E-K, K-E, J-K, K-J
Yu Seop Kim
|
Sung Dong Kim
|
Seong Bae Park
|
Jong Woo Lee
|
Jeong Ho Chang
|
Kyu Baek Hwang
|
Min O Jang
|
Yung Taek Kim
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: User Studies
We present four kinds of machine translation system in this description: E-K (English to Korean), K-E (Korean to English), J-K (Japanese to Korean), K-J (Korean to Japanese). Among these, E-K and K-J translation systems are published commercially, and the other systems have finished their development. This paper describes the structure and function of each system with figures and translation results.