Ben Hachey

2025

Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review
Yidong Gan | Maciej Rybinski | Ben Hachey | Jonathan K. Kummerfeld
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Clinical coding is crucial for healthcare billing and data analysis. Manual clinical coding is labour-intensive and error-prone, which has motivated research towards full automation of the process. However, our analysis, based on US English electronic health records and automated coding research using these records, shows that widely used evaluation methods are not aligned with real clinical contexts. For example, evaluations that focus on the top 50 most common codes are an oversimplification, as there are thousands of codes used in practice. This position paper aims to align AI coding research more closely with practical challenges of clinical coding. Based on our analysis, we offer eight specific recommendations, suggesting ways to improve current evaluation methods. Additionally, we propose new AI-based methods beyond automated coding, suggesting alternative approaches to assist clinical coders in their workflows.

pdf bib abs

Less is More: Explainable and Efficient ICD Code Prediction with Clinical Entities
James C. Douglas | Yidong Gan | Ben Hachey | Jonathan K. Kummerfeld
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Clinical coding, assigning standardized codes to medical notes, is critical for epidemiological research, hospital planning, and reimbursement. Neural coding models generally process entire discharge summaries, which are often lengthy and contain information that is not relevant to coding. We propose an approach that combines Named Entity Recognition (NER) and Assertion Classification (AC) to filter for clinically important content before supervised code prediction. On MIMIC-IV, a standard evaluation dataset, our approach achieves near-equivalent performance to a state-of-the-art full-text baseline while using only 22% of the content and reducing training time by over half. Additionally, mapping model attention to complete entity spans yields coherent, clinically meaningful explanations, capturing coding-relevant modifiers such as acuity and laterality. We release a newly annotated NER+AC dataset for MIMIC-IV, designed specifically for ICD coding. Our entity-centric approach lays a foundation for more transparent and cost-effective assisted coding.

2020

pdf bib abs

An Effective Transition-based Model for Discontinuous NER
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unlike widely used Named Entity Recognition (NER) data sets in generic domains, biomedical NER data sets often contain mentions consisting of discontinuous spans. Conventional sequence tagging techniques encode Markov assumptions that are efficient but preclude recovery of these mentions. We propose a simple, effective transition-based model with generic neural encoding for discontinuous NER. Through extensive experiments on three biomedical data sets, we show that our model can effectively recognize discontinuous mentions without sacrificing the accuracy on continuous mentions.

pdf bib abs

Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data. Often, the pretraining data used in these models are selected based on their subject matter, e.g., biology or computer science. Given the range of applications using social media text, and its unique language variety, we pretrain two models on tweets and forum text respectively, and empirically demonstrate the effectiveness of these two resources. In addition, we investigate how similarity measures can be used to nominate in-domain pretraining data. We publicly release our pretrained models at https://bit.ly/35RpTf0.

2019

pdf bib abs

Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

pdf bib abs

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

2018

pdf bib abs

Can adult mental health be predicted by childhood future-self narratives? Insights from the CLPsych 2018 Shared Task
Kylie Radford | Louise Lavrencic | Ruth Peters | Kim Kiely | Ben Hachey | Scott Nowson | Will Radford
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

The CLPsych 2018 Shared Task B explores how childhood essays can predict psychological distress throughout the author’s life. Our main aim was to build tools to help our psychologists understand the data, propose features and interpret predictions. We submitted two linear regression models: ModelA uses simple demographic and word-count features, while ModelB uses linguistic, entity, typographic, expert-gazetteer, and readability features. Our models perform best at younger prediction ages, with our best unofficial score at 23 of 0.426 disattenuated Pearson correlation. This task is challenging and although predictive performance is limited, we propose that tight integration of expertise across computational linguistics and clinical psychology is a productive direction.

2017

pdf bib abs

Learning to generate one-sentence biographies from Wikidata
Andrew Chisholm | Will Radford | Ben Hachey
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We investigate the generation of one-sentence Wikipedia biographies from facts derived from Wikidata slot-value pairs. We train a recurrent neural network sequence-to-sequence model with attention to select facts and generate textual summaries. Our model incorporates a novel secondary objective that helps ensure it generates sentences that contain the input facts. The model achieves a BLEU score of 41, improving significantly upon the vanilla sequence-to-sequence model and scoring roughly twice that of a simple template baseline. Human preference evaluation suggests the model is nearly as good as the Wikipedia reference. Manual analysis explores content selection, suggesting the model can trade the ability to infer knowledge against the risk of hallucinating incorrect information.

pdf bib abs

English Event Detection With Translated Language Features
Sam Wei | Igor Korostil | Joel Nothman | Ben Hachey
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose novel radical features from automatic translation for event extraction. Event detection is a complex language processing task for which it is expensive to collect training data, making generalisation challenging. We derive meaningful subword features from automatic translations into target language. Results suggest this method is particularly useful when using languages with writing systems that facilitate easy decomposition into subword features, e.g., logograms and Cangjie. The best result combines logogram features from Chinese and Japanese with syllable features from Korean, providing an additional 3.0 points f-score when added to state-of-the-art generalisation features on the TAC KBP 2015 Event Nugget task.

Ben Hachey

2025

2020

2019

2018

2017

2016

2015

2014

2012

2010

2009

2006

2005

2004

2003

Co-authors

Venues