Tuan-Dung Le

Also published as: Tuan Dung Le

2025

LAILab at ArchEHR-QA 2025: Test-time scaling for evidence selection in grounded question answering from electronic health records
Tuan Dung Le | Thanh Duong | Shohreh Haddadan | Behzad Jazayeri | Brandon Manley | Thanh Thieu
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)

pdf bib abs

ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding
Tuan-Dung Le | Shohreh Haddadan | Thanh Q. Thieu
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enhance model understanding of code hierarchies and synonyms, they often overlook the pervasive use of medical acronyms in clinical notes, a key factor in ICD code inference. To address this gap, we propose a novel effective data augmentation technique that leverages large language models to expand medical acronyms, allowing models to be trained on their full form representations. Moreover, we incorporate consistency training to regularize predictions by enforcing agreement between the original and augmented documents. Extensive experiments on the MIMIC-III dataset demonstrate that our approach, ACE-ICD establishes new state-of-the-art performance across multiple settings, including common codes, rare codes, and full-code assignments. Our code is publicly available.

2024

pdf bib abs

LAILab at Chemotimelines 2024: Finetuning sequence-to-sequence language models for temporal relation extraction towards cancer patient undergoing chemotherapy treatment
Shohreh Haddadan | Tuan-Dung Le | Thanh Duong | Thanh Thieu
Proceedings of the 6th Clinical Natural Language Processing Workshop

In this paper, we report our effort to tackle the challenge of extracting chemotimelines from EHR notes across a dataset of three cancer types. We focus on the two subtasks: 1) detection and classification of temporal relations given the annotated chemotherapy events and time expressions and 2) directly extracting patient chemotherapy timelines from EHR notes. We address both subtasks using Large Language Models. Our best-performing methods in both subtasks use Flan-T5, an instruction-tuned language model. Our proposed system achieves the highest average score in both subtasks. Our results underscore the effectiveness of finetuning general-domain large language models in domain-specific and unseen tasks.

Co-authors

Thanh Q. Thieu 1

Venues

Fix author