Thomas Searle
2020
Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset
Thomas Searle
|
Zina Ibrahim
|
Richard Dobson
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Clinical coding is currently a labour-intensive, error-prone, but a critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new benchmark results. A popular dataset used in this task is MIMIC-III, a large database of clinical free text notes and their associated codes amongst other data. We argue for the reconsideration of the validity MIMIC-III’s assigned codes, as MIMIC-III has not undergone secondary validation. This work presents an open-source, reproducible experimental methodology for assessing the validity of EHR discharge summaries. We exemplify the methodology with MIMIC-III discharge summaries and show the most frequently assigned codes in MIMIC-III are undercoded up to 35%.
2019
MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation
Thomas Searle
|
Zeljko Kraljevic
|
Rebecca Bendayan
|
Daniel Bean
|
Richard Dobson
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
An interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model for biomedical domain text, and the efficient collation of accurate research use case specific training data and subsequent model training. Screencast demo available here: https://www.youtube.com/watch?v=lM914DQjvSo
Search