Rapid growth in adoption of electronic health records (EHRs) has led to an unprecedented expansion in the availability of large longitudinal datasets. Large initiatives such as the Electronic Medical Records and Genomics (eMERGE) Network, the Patient-Centered Outcomes Research Network (PCORNet), and the Observational Health Data Science and Informatics (OHDSI) consortium, have been established and have reported successful applications of secondary use of EHRs in clinical research and practice. In these applications, natural language processing (NLP) technologies have played a crucial role as much of detailed patient information in EHRs is embedded in narrative clinical documents. Meanwhile, a number of clinical NLP systems, such as MedLEE, MetaMap/MetaMap Lite, cTAKES, and MedTagger have been developed and utilized to extract useful information from diverse types of clinical text, such as clinical notes, radiology reports, and pathology reports. Success stories in applying these tools have been reported widely. Despite the demonstrated success of NLP in the clinical domain, methodologies and tools developed for the clinical NLP are still underknown and underutilized by students and experts in the general NLP domain, mainly due to the limited exposure to EHR data. Through this tutorial, we would like to introduce NLP methodologies and tools developed in the clinical domain, and showcase the real-world NLP applications in clinical research and practice at Mayo Clinic (the No. 1 national hospital ranked by the U.S. News & World Report) and the University of Minnesota (the No. 41 best global universities ranked by the U.S. News & World Report). We will review NLP techniques in solving clinical problems and facilitating clinical research, the state-of-the art clinical NLP tools, and share collaboration experience with clinicians, as well as publicly available EHR data and medical resources, and finally conclude the tutorial with vast opportunities and challenges of clinical NLP. The tutorial will provide an overview of clinical backgrounds, and does not presume knowledge in medicine or health care. The goal of this tutorial is to encourage NLP researchers in the general domain (as opposed to the specialized clinical domain) to contribute to this burgeoning area. In this tutorial, we will first present an overview of clinical NLP. We will then dive into two subareas of clinical NLP in clinical research, including big data infrastructure for large-scale clinical NLP and advances of NLP in clinical research, and two subareas in clinical practice, including clinical information extraction and patient cohort retrieval using EHRs. Around 70% of the tutorial will review clinical problems, cutting-edge methodologies, and real-world clinical NLP tools while another 30% introduce use cases at Mayo Clinic and the University of Minnesota. Finally, we will conclude the tutorial with challenges and opportunities in this rapidly developing domain.
Domain-specific annotations for NLP are often centered on real-world applications of text, and incorrect annotations may be particularly unacceptable. In medical text, the process of manual chart review (of a patient’s medical record) is error-prone due to its complexity. We propose a staggered NLP-assisted approach to the refinement of clinical annotations, an interactive process that allows initial human judgments to be verified or falsified by means of comparison with an improving NLP system. We show on our internal Asthma Timelines dataset that this approach improves the quality of the human-produced clinical annotations.