Yang Liu

3M Health Information Systems

Other people with similar names: Yang Liu (Beijing Language and Culture University), Yang Liu (University of Helsinki), Yang Liu (Edinburgh Ph.D., Microsoft), Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (刘扬) (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Liu (Microsoft Cognitive Services Research), Yang Liu (刘扬) (Peking University), Yang Liu (刘扬) (May refer to several people), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu (Wilfrid Laurier University), Yang Liu (Tianjin University, China), Yang Liu (刘洋) (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence), Yang Liu (Samsung Research Center Beijing), Yang Liu (National University of Defense Technology)

2021

pdf bib abs
Effective Convolutional Attention Network for Multi-label Clinical Document Classification
Yang Liu | Hua Cheng | Russell Klopfer | Matthew R. Gormley | Thomas Schaaf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multi-label document classification (MLDC) problems can be challenging, especially for long documents with a large label set and a long-tail distribution over labels. In this paper, we present an effective convolutional attention network for the MLDC problem with a focus on medical code prediction from clinical documents. Our innovations are three-fold: (1) we utilize a deep convolution-based encoder with the squeeze-and-excitation networks and residual networks to aggregate the information across the document and learn meaningful document representations that cover different ranges of texts; (2) we explore multi-layer and sum-pooling attention to extract the most informative features from these multi-scale representations; (3) we combine binary cross entropy loss and focal loss to improve performance for rare labels. We focus our evaluation study on MIMIC-III, a widely used dataset in the medical domain. Our models outperform prior work on medical coding and achieve new state-of-the-art results on multiple metrics. We also demonstrate the language independent nature of our approach by applying it to two non-English datasets. Our model outperforms prior best model and a multilingual Transformer model by a substantial margin.

Co-authors

Venues

emnlp1

Fix author