Anni Zou


2024

pdf bib
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Xiangru Tang | Anni Zou | Zhuosheng Zhang | Ziming Li | Yilun Zhao | Xingyao Zhang | Arman Cohan | Mark Gerstein
Findings of the Association for Computational Linguistics: ACL 2024

Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents.

pdf bib
AuRoRA: A One-for-all Platform for Augmented Reasoning and Refining with Task-Adaptive Chain-of-Thought Prompting
Anni Zou | Zhuosheng Zhang | Hai Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) empowered by chain-of-thought (CoT) prompting have yielded remarkable prowess in reasoning tasks. Nevertheless, current methods predominantly lean on handcrafted or task-specific demonstrations, lack reliable knowledge basis and thus struggle for trustworthy responses in an automated pattern. While recent works endeavor to improve upon one certain aspect, they ignore the importance and necessity of establishing an integrated and interpretable reasoning system. To address these drawbacks and provide a universal solution, we propose AuRoRA: a one-for-all platform for augmented reasoning and refining based on CoT prompting that excels in adaptability, reliability, integrity, and interpretability. The system exhibits superior performances across six reasoning tasks and offers real-time visual analysis, which has pivotal academic and application value in the era of LLMs. The AuRoRA platform is available at https://huggingface.co/spaces/Anni123/AuRoRA.

2023

pdf bib
Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact Verification
Anni Zou | Zhuosheng Zhang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Commonsense fact verification, as a challenging branch of commonsense question-answering (QA), aims to verify through facts whether a given commonsense claim is correct or not. Answering commonsense questions necessitates a combination of knowledge from various levels. However, existing studies primarily rest on grasping either unstructured evidence or potential reasoning paths from structured knowledge bases, yet failing to exploit the benefits of heterogeneous knowledge simultaneously. In light of this, we propose Decker, a commonsense fact verification model that is capable of bridging heterogeneous knowledge by uncovering latent relationships between structured and unstructured knowledge. Experimental results on two commonsense fact verification benchmark datasets, CSQA2.0 and CREAK demonstrate the effectiveness of our Decker and further analysis verifies its capability to seize more precious information through reasoning. The official implementation of Decker is available at https://github.com/Anni-Zou/Decker.