Chenwei Yan
2025
LLM Sensitivity Evaluation Framework for Clinical Diagnosis
Chenwei Yan
|
Xiangling Fu
|
Yuxuan Xiong
|
Tianyi Wang
|
Siu Cheung Hui
|
Ji Wu
|
Xien Liu
Proceedings of the 31st International Conference on Computational Linguistics
Large language models (LLMs) have demonstrated impressive performance across various domains. However, for clinical diagnosis, higher expectations are required for LLM’s reliability and sensitivity: thinking like physicians and remaining sensitive to key medical information that affects diagnostic reasoning, as subtle variations can lead to different diagnosis results. Yet, existing works focus mainly on investigating the sensitivity of LLMs to irrelevant context and overlook the importance of key information. In this paper, we investigate the sensitivity of LLMs, i.e. GPT-3.5, GPT-4, Gemini, Claude3 and LLaMA2-7b, to key medical information by introducing different perturbation strategies. The evaluation results highlight the limitations of current LLMs in remaining sensitive to key medical information for diagnostic decision-making. The evolution of LLMs must focus on improving their reliability, enhancing their ability to be sensitive to key information, and effectively utilizing this information. These improvements will enhance human trust in LLMs and facilitate their practical application in real-world scenarios. Our code and dataset are available at https://github.com/chenwei23333/DiagnosisQA.
2020
Clinical-Coder: Assigning Interpretable ICD-10 Codes to Chinese Clinical Notes
Pengfei Cao
|
Chenwei Yan
|
Xiangling Fu
|
Yubo Chen
|
Kang Liu
|
Jun Zhao
|
Shengping Liu
|
Weifeng Chong
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
In this paper, we introduce Clinical-Coder, an online system aiming to assign ICD codes to Chinese clinical notes. ICD coding has been a research hotspot of clinical medicine, but the interpretability of prediction hinders its practical application. We exploit a Dilated Convolutional Attention network with N-gram Matching mechanism (DCANM) to capture semantic features for non-continuous words and continuous n-gram words, concentrating on explaining the reason why each ICD code to be predicted. The experiments demonstrate that our approach is effective and that our system is able to provide supporting information in clinical decision making.
Search
Fix data
Co-authors
- Xiangling Fu 2
- Pengfei Cao 1
- Yubo Chen (陈玉博) 1
- Weifeng Chong 1
- Siu Cheung Hui 1
- show all...