John Wu
2024
Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning
John Wu
|
David Wu
|
Jimeng Sun
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Medical coding, the translation of unstructured clinical text into standardized medical codes, is a crucial but time-consuming healthcare practice. Though large language models (LLM) could automate the coding process and improve the efficiency of such tasks, interpretability remains paramount for maintaining patient trust. Current efforts in interpretability of medical coding applications rely heavily on label attention mechanisms, which often leads to the highlighting of extraneous tokens irrelevant to the ICD code. To facilitate accurate interpretability in medical language models, this paper leverages dictionary learning that can efficiently extract sparsely activated representations from dense language model embeddings in superposition. Compared with common label attention mechanisms, our model goes beyond token-level representations by building an interpretable dictionary which enhances the mechanistic-based explanations for each ICD code prediction, even when the highlighted tokens are medically irrelevant. We show that dictionary features are human interpretable, can elucidate the hidden meanings of upwards of 90% of medically irrelevant tokens, and steer model behavior.
2020
Similarity Analysis of Contextual Word Representation Models
John Wu
|
Yonatan Belinkov
|
Hassan Sajjad
|
Nadir Durrani
|
Fahim Dalvi
|
James Glass
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.
Search
Co-authors
- David Wu 1
- Jimeng Sun 1
- Yonatan Belinkov 1
- Hassan Sajjad 1
- Nadir Durrani 1
- show all...