Lei Gao


pdf bib
Dialogue Medical Information Extraction with Medical-Item Graph and Dialogue-Status Enriched Representation
Lei Gao | Xinnan Zhang | Xian Wu | Shen Ge | Yefeng Zheng
Findings of the Association for Computational Linguistics: EMNLP 2023

The multi-turn doctor-patient dialogue includes rich medical knowledge, like the symptoms of the patient, the diagnosis and medication suggested by the doctor. If mined and represented properly, such medical knowledge can benefit a large range of clinical applications, including diagnosis assistance and medication recommendation. To derive structured knowledge from free text dialogues, we target a critical task: the Dialogue Medical Information Extraction (DMIE). DMIE aims to detect pre-defined clinical meaningful medical items (symptoms, surgery, etc.) as well as their statuses (positive, negative, etc.) from the dialogue. Existing approaches mainly formulate DMIE as a multi-label classification problem and ignore the relationships among medical items and statuses. Different from previous approaches, we propose a heterogeneous graph to model the relationship between items. We further propose two consecutive attention based modules to enrich the item representation with the dialogue and status. In this manner, we are able to model the relationships among medical items and statuses in the DMIE task. Experimental results on the public benchmark data set show that the proposed model outperforms previous works and achieves the state-of-the-art performance.


pdf bib
Modeling Document-level Causal Structures for Event Causal Relation Identification
Lei Gao | Prafulla Kumar Choubey | Ruihong Huang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We aim to comprehensively identify all the event causal relations in a document, both within a sentence and across sentences, which is important for reconstructing pivotal event structures. The challenges we identified are two: 1) event causal relations are sparse among all possible event pairs in a document, in addition, 2) few causal relations are explicitly stated. Both challenges are especially true for identifying causal relations between events across sentences. To address these challenges, we model rich aspects of document-level causal structures for achieving comprehensive causal relation identification. The causal structures include heavy involvements of document-level main events in causal relations as well as several types of fine-grained constraints that capture implications from certain sentential syntactic relations and discourse relations as well as interactions between event causal relations and event coreference relations. Our experimental results show that modeling the global and fine-grained aspects of causal structures using Integer Linear Programming (ILP) greatly improves the performance of causal relation identification, especially in identifying cross-sentence causal relations.


pdf bib
Detecting Online Hate Speech Using Context Aware Models
Lei Gao | Ruihong Huang
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In the wake of a polarizing election, the cyber world is laden with hate speech. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. In this paper, we provide an annotated corpus of hate speech with context information well kept. Then we propose two types of hate speech detection models that incorporate context information, a logistic regression model with context features and a neural network model with learning components for context. Our evaluation shows that both models outperform a strong baseline by around 3% to 4% in F1 score and combining these two models further improve the performance by another 7% in F1 score.

pdf bib
Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach
Lei Gao | Alexis Kuppersmith | Ruihong Huang
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In the wake of a polarizing election, social media is laden with hateful content. To address various limitations of supervised hate speech classification methods including corpus bias and huge cost of annotation, we propose a weakly supervised two-path bootstrapping approach for an online hate speech detection model leveraging large-scale unlabeled data. This system significantly outperforms hate speech detection systems that are trained in a supervised manner using manually annotated data. Applying this model on a large quantity of tweets collected before, after, and on election day reveals motivations and patterns of inflammatory language.