2024
pdf
bib
abs
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature
Dawei Li
|
Shu Yang
|
Zhen Tan
|
Jae Young Baik
|
Sukwon Yun
|
Joseph Lee
|
Aaron Chacko
|
Bojian Hou
|
Duy Duong-Tran
|
Ying Ding
|
Huan Liu
|
Li Shen
|
Tianlong Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer’s Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM.
2023
pdf
bib
abs
Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses
Liyan Tang
|
Yifan Peng
|
Yanshan Wang
|
Ying Ding
|
Greg Durrett
|
Justin Rousseau
Findings of the Association for Computational Linguistics: ACL 2023
A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, “less likely brainstorming,” that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models’ capability of generating less likely outputs is improved.
2022
pdf
bib
abs
EchoGen: Generating Conclusions from Echocardiogram Notes
Liyan Tang
|
Shravan Kooragayalu
|
Yanshan Wang
|
Ying Ding
|
Greg Durrett
|
Justin F. Rousseau
|
Yifan Peng
Proceedings of the 21st Workshop on Biomedical Language Processing
Generating a summary from findings has been recently explored (Zhang et al., 2018, 2020) in note types such as radiology reports that typically have short length. In this work, we focus on echocardiogram notes that is longer and more complex compared to previous note types. We formally define the task of echocardiography conclusion generation (EchoGen) as generating a conclusion given the findings section, with emphasis on key cardiac findings. To promote the development of EchoGen methods, we present a new benchmark, which consists of two datasets collected from two hospitals. We further compare both standard and start-of-the-art methods on this new benchmark, with an emphasis on factual consistency. To accomplish this, we develop a tool to automatically extract concept-attribute tuples from the text. We then propose an evaluation metric, FactComp, to compare concept-attribute tuples between the human reference and generated conclusions. Both automatic and human evaluations show that there is still a significant gap between human-written and machine-generated conclusions on echo reports in terms of factuality and overall quality.
2015
pdf
bib
A Joint Model of Product Properties, Aspects and Ratings for Online Reviews
Ying Ding
|
Jing Jiang
Proceedings of the International Conference Recent Advances in Natural Language Processing
pdf
bib
Towards Opinion Summarization from Online Forums
Ying Ding
|
Jing Jiang
Proceedings of the International Conference Recent Advances in Natural Language Processing
2014
pdf
bib
A Unified Topic-Style Model for Online Discussions
Ying Ding
|
Jing Jiang
|
Qiming Diao
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media