Yue Pan


pdf bib
Large Scale Sequence-to-Sequence Models for Clinical Note Generation from Patient-Doctor Conversations
Gagandeep Singh | Yue Pan | Jesus Andres-Ferrer | Miguel Del-Agua | Frank Diehl | Joel Pinto | Paul Vozila
Proceedings of the 5th Clinical Natural Language Processing Workshop

We present our work on building large scale sequence-to-sequence models for generating clinical note from patient-doctor conversation. This is formulated as an abstractive summarization task for which we use encoder-decoder transformer model with pointer-generator. We discuss various modeling enhancements to this baseline model which include using subword and multiword tokenization scheme, prefixing the targets with a chain-of-clinical-facts, and training with contrastive loss that is defined over various candidate summaries. We also use flash attention during training and query chunked attention during inference to be able to process long input and output sequences and to improve computational efficiency. Experiments are conducted on a dataset containing about 900K encounters from around 1800 healthcare providers covering 27 specialties. The results are broken down into primary care and non-primary care specialties. Consistent accuracy improvements are observed across both of these categories.


pdf bib
A Comparative Study of Collocation Extraction Methods from the Perspectives of Vocabulary and Grammar: A Case Study in the Field of Journalism
Lulu Gu | Yue Pan | Pengyuan Liu
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation


pdf bib
Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models
Seppo Enarvi | Marilisa Amoia | Miguel Del-Agua Teba | Brian Delaney | Frank Diehl | Stefan Hahn | Kristina Harris | Liam McGrath | Yue Pan | Joel Pinto | Luca Rubini | Miguel Ruiz | Gagandeep Singh | Fabian Stemmer | Weiyi Sun | Paul Vozila | Thomas Lin | Ranjani Ramamurthy
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

We discuss automatic creation of medical reports from ASR-generated patient-doctor conversational transcripts using an end-to-end neural summarization approach. We explore both recurrent neural network (RNN) and Transformer-based sequence-to-sequence architectures for summarizing medical conversations. We have incorporated enhancements to these architectures, such as the pointer-generator network that facilitates copying parts of the conversations to the reports, and a hierarchical RNN encoder that makes RNN training three times faster with long inputs. A comparison of the relative improvements from the different model architectures over an oracle extractive baseline is provided on a dataset of 800k orthopedic encounters. Consistent with observations in literature for machine translation and related tasks, we find the Transformer models outperform RNN in accuracy, while taking less than half the time to train. Significantly large wins over a strong oracle baseline indicate that sequence-to-sequence modeling is a promising approach for automatic generation of medical reports, in the presence of data at scale.


pdf bib
Improved-Edit-Distance Kernel for Chinese Relation Extraction
Wanxiang Che | Jianmin Jiang | Zhong Su | Yue Pan | Ting Liu
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts


pdf bib
Advances in meeting recognition
Alex Waibel | Hua Yu | Tanja Schultz | Yue Pan | Michael Bett | Martin Westphal | Hagen Soltau | Thomas Schaaf | Florian Metze
Proceedings of the First International Conference on Human Language Technology Research