Hong Yu


2022

pdf bib
Generation of Patient After-Visit Summaries to Support Physicians
Pengshan Cai | Fei Liu | Adarsha Bajracharya | Joe Sills | Alok Kapoor | Weisong Liu | Dan Berlowitz | David Levy | Richeek Pradhan | Hong Yu
Proceedings of the 29th International Conference on Computational Linguistics

An after-visit summary (AVS) is a summary note given to patients after their clinical visit. It recaps what happened during their clinical visit and guides patients’ disease self-management. Studies have shown that a majority of patients found after-visit summaries useful. However, many physicians face excessive workloads and do not have time to write clear and informative summaries. In this paper, we study the problem of automatic generation of after-visit summaries and examine whether those summaries can convey the gist of clinical visits. We report our findings on a new clinical dataset that contains a large number of electronic health record (EHR) notes and their associated summaries. Our results suggest that generation of lay language after-visit summaries remains a challenging task. Crucially, we introduce a feedback mechanism that alerts physicians when an automatic summary fails to capture the important details of the clinical notes or when it contains hallucinated facts that are potentially detrimental to the summary quality. Automatic and human evaluation demonstrates the effectiveness of our approach in providing writing feedback and supporting physicians.

pdf bib
ScAN: Suicide Attempt and Ideation Events Dataset
Bhanu Pratap Singh Rawat | Samuel Kovaly | Hong Yu | Wilfred Pigeon
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients’ previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients’ suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient’s stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient’s hospital-stay, respectively. ScAN and ScANER are publicly available.

pdf bib
Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition
Pengshan Cai | Hui Wan | Fei Liu | Mo Yu | Hong Yu | Sachindra Joshi
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose novel AI-empowered chat bots for learning as conversation where a user does not read a passage but gains information and knowledge through conversation with a teacher bot. Our information acquisition-oriented dialogue system employs a novel adaptation of reinforced self-play so that the system can be transferred to various domains without in-domain dialogue data, and can carry out conversations both informative and attentive to users.

2021

pdf bib
Improving Formality Style Transfer with Context-Aware Rule Injection
Zonghai Yao | Hong Yu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST) by injecting multiple rules into an end-to-end BERT-based encoder and decoder model. CARI is able to learn to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models’ performance on several tweet sentiment analysis tasks. Our contributions are as follows: 1.We propose a new method, CARI, to integrate rules for pre-trained language models. CARI is context-aware and can trained end-to-end with the downstream NLP applications. 2.We have achieved new state-of-the-art results for FST on the benchmark GYAFC dataset. 3.We are the first to evaluate FST methods with extrinsic evaluation and specifically on sentiment classification tasks. We show that CARI outperformed existing rule-based FST approaches for sentiment classification.

pdf bib
CREAD: Combined Resolution of Ellipses and Anaphora in Dialogues
Bo-Hsiang Tseng | Shruti Bhargava | Jiarui Lu | Joel Ruben Antony Moniz | Dhivya Piraviperumal | Lin Li | Hong Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Anaphora and ellipses are two common phenomena in dialogues. Without resolving referring expressions and information omission, dialogue systems may fail to generate consistent and coherent responses. Traditionally, anaphora is resolved by coreference resolution and ellipses by query rewrite. In this work, we propose a novel joint learning framework of modeling coreference resolution and query rewriting for complex, multi-turn dialogue understanding. Given an ongoing dialogue between a user and a dialogue assistant, for the user query, our joint learning model first predicts coreference links between the query and the dialogue context, and then generates a self-contained rewritten user query. To evaluate our model, we annotate a dialogue based coreference resolution dataset, MuDoCo, with rewritten queries. Results show that the performance of query rewrite can be substantially boosted (+2.3% F1) with the aid of coreference modeling. Furthermore, our joint model outperforms the state-of-the-art coreference resolution model (+2% F1) on this dataset.

2020

pdf bib
Calibrating Structured Output Predictors for Natural Language Processing
Abhyuday Jagannatha | Hong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the applications are to be deployed in a safety-critical domain such as healthcare. However the output space of such structured prediction models are often too large to directly adapt binary or multi-class calibration methods. In this study, we propose a general calibration scheme for output entities of interest in neural network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for Named Entity Recognition, Part-of-speech tagging and Question Answering systems. We also observe an improvement in model performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-of-domain test scenarios as well.

pdf bib
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab
Yonghao Jin | Fei Li | Hong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a “what you see is what you get” (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.

pdf bib
Conversational Machine Comprehension: a Literature Review
Somil Gupta | Bhanu Pratap Singh Rawat | Hong Yu
Proceedings of the 28th International Conference on Computational Linguistics

Conversational Machine Comprehension (CMC), a research track in conversational AI, expects the machine to understand an open-domain natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.

pdf bib
Dynamic Data Selection for Curriculum Learning via Ability Estimation
John P. Lalor | Hong Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.

pdf bib
Generating Accurate Electronic Health Assessment from Medical Graph
Zhichao Yang | Hong Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

One of the fundamental goals of artificial intelligence is to build computer-based expert systems. Inferring clinical diagnoses to generate a clinical assessment during a patient encounter is a crucial step towards building a medical diagnostic system. Previous works were mainly based on either medical domain-specific knowledge, or patients’ prior diagnoses and clinical encounters. In this paper, we propose a novel model for automated clinical assessment generation (MCAG). MCAG is built on an innovative graph neural network, where rich clinical knowledge is incorporated into an end-to-end corpus-learning system. Our evaluation results against physician generated gold standard show that MCAG significantly improves the BLEU and rouge score compared with competitive baseline models. Further, physicians’ evaluation showed that MCAG could generate high-quality assessments.

pdf bib
Conversational Semantic Parsing for Dialog State Tracking
Jianpeng Cheng | Devang Agrawal | Héctor Martínez Alonso | Shruti Bhargava | Joris Driesen | Federico Flego | Dain Kaplan | Dimitri Kartsaklis | Lin Li | Dhivya Piraviperumal | Jason D. Williams | Hong Yu | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.

2019

pdf bib
Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds
John P. Lalor | Hao Wu | Hong Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.

pdf bib
Generating Classical Chinese Poems from Vernacular Chinese
Zhichao Yang | Pengshan Cai | Yansong Feng | Fei Li | Weijiang Feng | Elena Suet-Ying Chiu | Hong Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Classical Chinese poetry is a jewel in the treasure house of Chinese culture. Previous poem generation models only allow users to employ keywords to interfere the meaning of generated poems, leaving the dominion of generation to the model. In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems. We adapt the approach of unsupervised machine translation (UMT) to our task. We use segmentation-based padding and reinforcement learning to address under-translation and over-translation respectively. According to experiments, our approach significantly improve the perplexity and BLEU compared with typical UMT models. Furthermore, we explored guidelines on how to write the input vernacular to generate better poems. Human evaluation showed our approach can generate high-quality poems which are comparable to amateur poems.

pdf bib
Bacteria Biotope Relation Extraction via Lexical Chains and Dependency Graphs
Wuti Xiong | Fei Li | Ming Cheng | Hong Yu | Donghong Ji
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

abstract In this article, we describe our approach for the Bacteria Biotopes relation extraction (BB-rel) subtask in the BioNLP Shared Task 2019. This task aims to promote the development of text mining systems that extract relationships between Microorganism, Habitat and Phenotype entities. In this paper, we propose a novel approach for dependency graph construction based on lexical chains, so one dependency graph can represent one or multiple sentences. After that, we propose a neural network model which consists of the bidirectional long short-term memories and an attention graph convolution neural network to learn relation extraction features from the graph. Our approach is able to extract both intra- and inter-sentence relations, and meanwhile utilize syntax information. The results show that our approach achieved the best F1 (66.3%) in the official evaluation participated by 7 teams.

2018

pdf bib
Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study
John P. Lalor | Hao Wu | Tsendsuren Munkhdalai | Hong Yu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. In this work we examine the impact of a test set question’s difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question’s difficulty. In addition, as DNNs are trained on larger datasets easy questions start to have a higher probability of being answered correctly than harder questions.

pdf bib
Sentence Simplification with Memory-Augmented Neural Networks
Tu Vu | Baotian Hu | Tsendsuren Munkhdalai | Hong Yu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.

2017

pdf bib
Neural Tree Indexers for Text Understanding
Tsendsuren Munkhdalai | Hong Yu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binary tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.

pdf bib
Neural Semantic Encoders
Tsendsuren Munkhdalai | Hong Yu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can also access 1 multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.

2016

pdf bib
Bidirectional RNN for Medical Event Detection in Electronic Health Records
Abhyuday N Jagannatha | Hong Yu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Building an Evaluation Scale using Item Response Theory
John P. Lalor | Hao Wu | Hong Yu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Structured prediction models for RNN based sequence labeling in clinical text
Abhyuday Jagannatha | Hong Yu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Citation Analysis with Neural Attention Models
Tsendsuren Munkhdalai | John P. Lalor | Hong Yu
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis

2015

pdf bib
Key Concept Identification for Medical Information Retrieval
Jiaping Zheng | Hong Yu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Key Concepts from EHR Notes Using Domain Adaptation
Jiaping Zheng | Hong Yu
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

pdf bib
Mining and Ranking Biomedical Synonym Candidates from Wikipedia
Abhyuday Jagannatha | Jinying Chen | Hong Yu
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2009

pdf bib
Evaluation of the Clinical Question Answering Presentation
Yong-Gang Cao | John Ely | Lamont Antieau | Hong Yu
Proceedings of the BioNLP 2009 Workshop

2008

pdf bib
A Pilot Annotation to Investigate Discourse Connectivity in Biomedical Text
Hong Yu | Nadya Frid | Susan McRoy | Rashmi Prasad | Alan Lee | Aravind Joshi
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2006

pdf bib
BioEx: A Novel User-Interface that Accesses Images from Abstract Sentences
Hong Yu | Minsuk Lee
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
The Semantics of a Definiendum Constrains both the Lexical Semantics and the Lexicosyntactic Patterns in the Definiens
Hong Yu | Ying Wei
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf bib
Exploring Text and Image Features to Classify Images in Bioscience Literature
Barry Rafkind | Minsuk Lee | Shih-Fu Chang | Hong Yu
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2003

pdf bib
Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences
Hong Yu | Vasileios Hatzivassiloglou
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing