Yang Liu
3M Health Information Systems
Other people with similar names:
Yang Janet Liu
(Georgetown University; 刘洋),
Yang Liu
(May refer to several people),
Yang Liu
(University of Helsinki),
Yang Liu
(Beijing Language and Culture University),
Yang Liu
(National University of Defense Technology),
Yang Liu
(Edinburgh Ph.D., Microsoft),
Yang Liu
(The Chinese University of Hong Kong (Shenzhen)),
Yang Liu
(刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon),
Yang Liu
(刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence),
Yang Liu
(Microsoft Cognitive Services Research),
Yang Liu
(Peking University),
Yang Liu
(Samsung Research Center Beijing),
Yang Liu
(Tianjin University, China),
Yang Liu
(Univ. of Michigan, UC Santa Cruz),
Yang Liu
(Wilfrid Laurier University)
2021
pdf
bib
abs
Effective Convolutional Attention Network for Multi-label Clinical Document Classification
Yang Liu
|
Hua Cheng
|
Russell Klopfer
|
Matthew R. Gormley
|
Thomas Schaaf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Multi-label document classification (MLDC) problems can be challenging, especially for long documents with a large label set and a long-tail distribution over labels. In this paper, we present an effective convolutional attention network for the MLDC problem with a focus on medical code prediction from clinical documents. Our innovations are three-fold: (1) we utilize a deep convolution-based encoder with the squeeze-and-excitation networks and residual networks to aggregate the information across the document and learn meaningful document representations that cover different ranges of texts; (2) we explore multi-layer and sum-pooling attention to extract the most informative features from these multi-scale representations; (3) we combine binary cross entropy loss and focal loss to improve performance for rare labels. We focus our evaluation study on MIMIC-III, a widely used dataset in the medical domain. Our models outperform prior work on medical coding and achieve new state-of-the-art results on multiple metrics. We also demonstrate the language independent nature of our approach by applying it to two non-English datasets. Our model outperforms prior best model and a multilingual Transformer model by a substantial margin.