Graciela Hernandez


2024

pdf bib
Automatic sentence segmentation of clinical record narratives in real-world data
Dongfang Xu | Davy Weissenbacher | Karen O’Connor | Siddharth Rawal | Graciela Hernandez
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Sentence segmentation is a linguistic task and is widely used as a pre-processing step in many NLP applications. The need for sentence segmentation is particularly pronounced in clinical notes, where ungrammatical and fragmented texts are common. We propose a straightforward and effective sequence labeling classifier to predict sentence spans using a dynamic sliding window based on the prediction of each input sequence. This sliding window algorithm allows our approach to segment long text sequences on the fly. To evaluate our approach, we annotated 90 clinical notes from the MIMIC-III dataset. Additionally, we tested our approach on five other datasets to assess its generalizability and compared its performance against state-of-the-art systems on these datasets. Our approach outperformed all the systems, achieving an F1 score that is 15% higher than the next best-performing system on the clinical dataset.