MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing

Minghao Liu, Zhitao He, Zhiyuan Fan, Qingyun Wang, Yi R. Fung


Abstract
Text-guided image editing has seen significant progress in natural image domains, but its application in medical imaging remains limited and lacks standardized evaluation frameworks. Such editing could revolutionize clinical practices by enabling personalized surgical planning, enhancing medical education, and improving patient communication. To bridge this gap, we introduce MedEBench, a robust benchmark designed to diagnose reliability in text-guided medical image editing. MedEBench consists of 1,182 clinically curated image-prompt pairs covering 70 distinct editing tasks and 13 anatomical regions. It contributes in three key areas: (1) a clinically grounded evaluation framework that measures Editing Accuracy, Context Preservation, and Visual Quality, complemented by detailed descriptions of intended edits and corresponding Region-of-Interest (ROI) masks; (2) a comprehensive comparison of seven state-of-the-art models, revealing consistent patterns of failure; and (3) a diagnostic error analysis technique that leverages attention alignment, using Intersection-over-Union (IoU) between model attention maps and ROI masks to identify mislocalization issues, where models erroneously focus on incorrect anatomical regions. MedEBench sets the stage for developing more reliable and clinically effective text-guided medical image editing tools.
Anthology ID:
2025.findings-emnlp.41
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
767–791
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.41/
DOI:
Bibkey:
Cite (ACL):
Minghao Liu, Zhitao He, Zhiyuan Fan, Qingyun Wang, and Yi R. Fung. 2025. MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 767–791, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.41.pdf
Checklist:
 2025.findings-emnlp.41.checklist.pdf