Huan Zhong
2022
How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling
Samuel Cahyawijaya
|
Bryan Wilie
|
Holy Lovenia
|
Huan Zhong
|
MingQian Zhong
|
Yuk-Yu Nancy Ip
|
Pascale Fung
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving ~10% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables.
Search
Fix data
Co-authors
- Samuel Cahyawijaya 1
- Pascale Fung 1
- Yuk-Yu Nancy Ip 1
- Holy Lovenia 1
- Bryan Wilie 1
- show all...