Analysing Representations of Memory Impairment in a Clinical Notes Classification Model
Mark Ormerod | Jesús Martínez-del-Rincón | Neil Robertson | Bernadette McGuinness | Barry Devereux
Proceedings of the 18th BioNLP Workshop and Shared Task
Despite recent advances in the application of deep neural networks to various kinds of medical data, extracting information from unstructured textual sources remains a challenging task. The challenges of training and interpreting document classification models are amplified when dealing with small and highly technical datasets, as are common in the clinical domain. Using a dataset of de-identified clinical letters gathered at a memory clinic, we construct several recurrent neural network models for letter classification, and evaluate them on their ability to build meaningful representations of the documents and predict patients’ diagnoses. Additionally, we probe sentence embedding models in order to build a human-interpretable representation of the neural network’s features, using a simple and intuitive technique based on perturbative approaches to sentence importance. In addition to showing which sentences in a document are most informative about the patient’s condition, this method reveals the types of sentences that lead the model to make incorrect diagnoses. Furthermore, we identify clusters of sentences in the embedding space that correlate strongly with importance scores for each clinical diagnosis class.