Thomas Huang


2024

pdf bib
Yale at “Discharge Me!”: Evaluating Constrained Generation of Discharge Summaries with Unstructured and Structured Information
Vimig Socrates | Thomas Huang | Xuguang Ai | Soraya Fereydooni | Qingyu Chen | R Andrew Taylor | David Chartash
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

In this work, we propose our top-ranking (2nd place) pipeline for the generation of discharge summary subsections as a part of the BioNLP 2024 Shared Task 2: “Discharge Me!”. We evaluate both encoder-decoder and state-of-the-art decoder-only language models on the generation of two key sections of the discharge summary. To evaluate the ability of NLP methods to further alleviate the documentation burden on physicians, we also design a novel pipeline to generate the brief hospital course directly from structured information found in the EHR. Finally, we evaluate a constrained beam search approach to inject external knowledge about relevant patient problems into the text generation process. We find that a BioBART model fine-tuned on a larger fraction of the data without constrained beam search outperforms all other models.

2020

pdf bib
Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains
Tarek Sakakini | Jong Yoon Lee | Aditya Duri | Renato F.L. Azevedo | Victor Sadauskas | Kuangxiao Gu | Suma Bhat | Dan Morrow | James Graumlich | Saqib Walayat | Mark Hasegawa-Johnson | Thomas Huang | Ann Willemsen-Dunlap | Donald Halpin
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

Healthcare systems have increased patients’ exposure to their own health materials to enhance patients’ health levels, but this has been impeded by patients’ lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database: Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople’s comprehension of health material via a human subjects’ study (n=160).

2014

pdf bib
Cross-media Cross-genre Information Ranking based on Multi-media Information Networks
Tongtao Zhang | Haibo Li | Hongzhao Huang | Heng Ji | Min-Hsuan Tsai | Shen-Fu Tsai | Thomas Huang
Proceedings of the Third Workshop on Vision and Language

2010

pdf bib
Enhancing Multi-lingual Information Extraction via Cross-Media Inference and Fusion
Adam Lee | Marissa Passantino | Heng Ji | Guojun Qi | Thomas Huang
Coling 2010: Posters

2009

pdf bib
Spherical Discriminant Analysis in Semi-supervised Speaker Clustering
Hao Tang | Stephen Chu | Thomas Huang
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers