Xinwen Xu
2025
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
Xingjian Diao
|
Weiyi Wu
|
Keyi Kong
|
Peijun Qing
|
Xinwen Xu
|
Ming Cheng
|
Soroush Vosoughi
|
Jiang Gui
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Visual Question Answering (VQA) is increasingly used in diverse applications ranging from general visual reasoning to safety-critical domains such as medical imaging and autonomous systems, where models must provide not only accurate answers but also explanations that humans can easily understand and verify. Prototype-based modeling has shown promise for interpretability by grounding predictions in semantically meaningful regions for purely visual reasoning tasks, yet remains underexplored in the context of VQA. We present ProtoVQA, a unified prototypical framework that (i) learns question-aware prototypes that serve as reasoning anchors, connecting answers to discriminative image regions, (ii) applies spatially constrained matching to ensure that the selected evidence is coherent and semantically relevant, and (iii) supports both answering and grounding tasks through a shared prototype backbone. To assess explanation quality, we propose the Visual–Linguistic Alignment Score (VLAS), which measures how well the model’s attended regions align with ground-truth evidence. Experiments on Visual7W show that ProtoVQA yields faithful, fine-grained explanations while maintaining competitive accuracy, advancing the development of transparent and trustworthy VQA systems.
Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models
Weiyi Wu
|
Xinwen Xu
|
Chongyang Gao
|
Xingjian Diao
|
Siting Li
|
Lucas A. Salas
|
Jiang Gui
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) offer transformative potential across diverse fields, yet their safe and effective deployment is hindered by inherent knowledge conflicts—stemming from temporal evolution, divergent sources, and contradictory guidelines. This challenge is particularly acute in medicine, an interdisciplinary frontier for NLP. Rapid medical concept drift can lead LLMs to provide incorrect or outdated advice, impacting their utility and the broader societal benefits of NLP advances. This study introduces ConflictMedQA, a benchmark designed to systematically evaluate how LLMs manage varied knowledge conflicts in clinical guidelines. Our assessment of seven state-of-the-art models across 4,290 scenarios reveals significant difficulties in rejecting incorrect recommendations and frequent endorsement of conflicting advice, highlighting an important gap for NLP systems intended for real-world impact. We explore two fundamental mitigation approaches: retrieval-augmented generation and preference fine-tuning via direct preference optimization. While each offers improvements, their synergistic combination yields the best results. These findings emphasize the need for LLMs to discern subtle but critical guideline conflicts. This is a crucial step in advancing NLP’s capabilities and ensuring its dependable application in critical societal domains. The proposed dataset is available at https://huggingface.co/datasets/RDBH/DriftMed.
Search
Fix author
Co-authors
- Xingjian Diao 2
- Jiang Gui 2
- Weiyi Wu 2
- Ming Cheng 1
- Chongyang Gao 1
- show all...