Statistical Section Segmentation in Free-Text Clinical Records

Michael Tepper; Daniel Capurro; Fei Xia; Lucy Vanderwende; Meliha Yetisgen-Yildiz

Statistical Section Segmentation in Free-Text Clinical Records

Michael Tepper, Daniel Capurro, Fei Xia, Lucy Vanderwende, Meliha Yetisgen-Yildiz

Abstract

Automatically segmenting and classifying clinical free text into sections is an important first step to automatic information retrieval, information extraction and data mining tasks, as it helps to ground the significance of the text within. In this work we describe our approach to automatic section segmentation of clinical records such as hospital discharge summaries and radiology reports, along with section classification into pre-defined section categories. We apply machine learning to the problems of section segmentation and section classification, comparing a joint (one-step) and a pipeline (two-step) approach. We demonstrate that our systems perform well when tested on three data sets, two for hospital discharge summaries and one for radiology reports. We then show the usefulness of section information by incorporating it in the task of extracting comorbidities from discharge summaries.

Anthology ID:: L12-1605
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 2001–2008
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/1016_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Michael Tepper, Daniel Capurro, Fei Xia, Lucy Vanderwende, and Meliha Yetisgen-Yildiz. 2012. Statistical Section Segmentation in Free-Text Clinical Records. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2001–2008, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: Statistical Section Segmentation in Free-Text Clinical Records (Tepper et al., LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/1016_Paper.pdf

PDF Cite Search Fix data