Chun-nan Hsu

Also published as: Chun-Nan Hsu


2024

pdf bib
MiDRED: An Annotated Corpus for Microbiome Knowledge Base Construction
William Hogan | Andrew Bartko | Jingbo Shang | Chun-Nan Hsu
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

The interplay between microbiota and diseases has emerged as a significant area of research facilitated by the proliferation of cost-effective and precise sequencing technologies. To keep track of the many findings, domain experts manually review publications to extract reported microbe-disease associations and compile them into knowledge bases. However, manual curation efforts struggle to keep up with the pace of publications. Relation extraction has demonstrated remarkable success in other domains, yet the availability of datasets supporting such methods within the domain of microbiome research remains limited. To bridge this gap, we introduce the Microbe-Disease Relation Extraction Dataset (MiDRED); a human-annotated dataset containing 3,116 annotations of fine-grained relationships between microbes and diseases. We hope this dataset will help address the scarcity of data in this crucial domain and facilitate the development of advanced text-mining solutions to automate the creation and maintenance of microbiome knowledge bases.

2023

pdf bib
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Zexue He | Yu Wang | An Yan | Yao Liu | Eric Chang | Amilcare Gentili | Julian McAuley | Chun-Nan Hsu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Curated datasets for healthcare are often limited due to the need of human annotations from experts. In this paper, we present MedEval, a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare. MedEval is comprehensive and consists of data from several healthcare systems and spans 35 human body regions from 8 examination modalities. With 22,779 collected sentences and 21,228 reports, we provide expert annotations at multiple levels, offering a granular potential usage of the data and supporting a wide range of tasks. Moreover, we systematically evaluated 10 generic and domain-specific language models under zero-shot and finetuning settings, from domain-adapted baselines in healthcare to general-purposed state-of-the-art large language models (e.g., ChatGPT). Our evaluations reveal varying effectiveness of the two categories of language models across different tasks, from which we notice the importance of instruction tuning for few-shot usage of large language models. Our investigation paves the way toward benchmarking language models for healthcare and provides valuable insights into the strengths and limitations of adopting large language models in medical domains, informing their practical applications and future advancements.

2021

pdf bib
BLAR: Biomedical Local Acronym Resolver
William Hogan | Yoshiki Vazquez Baeza | Yannis Katsis | Tyler Baldwin | Ho-Cheol Kim | Chun-Nan Hsu
Proceedings of the 20th Workshop on Biomedical Language Processing

NLP has emerged as an essential tool to extract knowledge from the exponentially increasing volumes of biomedical texts. Many NLP tasks, such as named entity recognition and named entity normalization, are especially challenging in the biomedical domain partly because of the prolific use of acronyms. Long names for diseases, bacteria, and chemicals are often replaced by acronyms. We propose Biomedical Local Acronym Resolver (BLAR), a high-performing acronym resolver that leverages state-of-the-art (SOTA) pre-trained language models to accurately resolve local acronyms in biomedical texts. We test BLAR on the Ab3P corpus and achieve state-of-the-art results compared to the current best-performing local acronym resolution algorithms and models.

pdf bib
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
An Yan | Zexue He | Xing Lu | Jiang Du | Eric Chang | Amilcare Gentili | Julian McAuley | Chun-Nan Hsu
Findings of the Association for Computational Linguistics: EMNLP 2021

Radiology report generation aims at generating descriptive text from radiology images automatically, which may present an opportunity to improve radiology reporting and interpretation. A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss, which struggles to generate informative sentences for clinical diagnoses since normal findings dominate the datasets. To tackle this challenge and encourage more clinically-accurate text outputs, we propose a novel weakly supervised contrastive loss for medical report generation. Experimental results demonstrate that our method benefits from contrasting target reports with incorrect but semantically-close ones. It outperforms previous work on both clinical correctness and text generation metrics for two public benchmarks.

2020

pdf bib
Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays
Jianmo Ni | Chun-Nan Hsu | Amilcare Gentili | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2020

Automatic medical image report generation has drawn growing attention due to its potential to alleviate radiologists’ workload. Existing work on report generation often trains encoder-decoder networks to generate complete reports. However, such models are affected by data bias (e.g. label imbalance) and face common issues inherent in text generation models (e.g. repetition). In this work, we focus on reporting abnormal findings on radiology images; instead of training on complete radiology reports, we propose a method to identify abnormal findings from the reports in addition to grouping them with unsupervised clustering and minimal rules. We formulate the task as cross-modal retrieval and propose Conditional Visual-Semantic Embeddings to align images and fine-grained abnormal findings in a joint embedding space. We demonstrate that our method is able to retrieve abnormal findings and outperforms existing generation models on both clinical correctness and text generation metrics.

2013

pdf bib
Reconstructing Big Semantic Similarity Networks
Ai He | Shefali Sharma | Chun-Nan Hsu
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing

2012

pdf bib
Exploring Label Dependency in Active Learning for Phenotype Mapping
Shefali Sharma | Leslie Lange | Jose Luis Ambite | Yigal Arens | Chun-Nan Hsu
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Learning Phenotype Mapping for Integrating Large Genetic Data
Chun-Nan Hsu | Cheng-Ju Kuo | Congxing Cai | Sarah Pendergrass | Marylyn Ritchie | Jose Luis Ambite
Proceedings of BioNLP 2011 Workshop

2008

pdf bib
Acoustic Model Optimization for Multilingual Speech Recognition
Dau-Cheng Lyu | Chun-Nan Hsu | Yuang-Chin Chiang | Ren-Yuan Lyu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 3, September 2008: Special Issue on Selected Papers from ROCLING XIX

2007

pdf bib
多語聲學單位分類之最佳化研究 (The Study of Acoustic Model Clustering in Multilingual Speech Recognition) [In Chinese]
Dau-cheng Lyu | Ren-yuan Lyu | Yung-Jien Chiang | Chun-nan Hsu
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing

2005

pdf bib
Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition
Dau-Cheng Lyu | Ren-Yuan Lyu | Yuang-Chin Chiang | Chun-Nan Hsu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 3, September 2005: Special Issue on Selected Papers from ROCLING XVI

2004

pdf bib
華台雙語發音變異性之語音辨識研究及PDA之應用 (The study of pronunciation variations in Mandarin and Taiwanese and its application in PDA) [In Chinese]
Dau-cheng Lyu | Hong-Wen Hsien | Yung-Xian Lee | Zhong-Ing Liou | Chun-Nan Hsu | Yung-Jien Chiang | Ren-yuan Lyu
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing