3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Ivan Sviridov; Amina Miftakhova; Artemiy Tereshchenko; Galina Zubkova; Pavel Blinov; Andrey Savchenko

doi:10.18653/v1/2025.emnlp-main.1353

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Ivan Sviridov, Amina Miftakhova, Artemiy Tereshchenko, Galina Zubkova, Pavel Blinov, Andrey Savchenko

Abstract

Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality via Assessor Agent. It includes 2996 cases across 34 diagnoses from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for widely used open and closed-source LVLMs. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional neural network into the LVLM’s context boosts F1 by up to 20%. Source code is available at https://github.com/univanxx/3mdbench.

Anthology ID:: 2025.emnlp-main.1353
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26614–26654
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1353/
DOI:: 10.18653/v1/2025.emnlp-main.1353
Bibkey:
Cite (ACL):: Ivan Sviridov, Amina Miftakhova, Artemiy Tereshchenko, Galina Zubkova, Pavel Blinov, and Andrey Savchenko. 2025. 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26614–26654, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark (Sviridov et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1353.pdf
Checklist:: 2025.emnlp-main.1353.checklist.pdf

PDF Cite Search Checklist Fix data