Lydia Chelala


2025

pdf bib
CLEAR: A Clinically Grounded Tabular Framework for Radiology Report Evaluation
Yuyang Jiang | Chacha Chen | Shengyuan Wang | Feng Li | Zecong Tang | Benjamin M. Mervak | Lydia Chelala | Christopher M Straus | Reve Chahine | Samuel G. Armato Iii | Chenhao Tan
Findings of the Association for Computational Linguistics: EMNLP 2025

Existing metrics often lack the granularity and interpretability to capture nuanced clinical differences between candidate and ground-truth radiology reports, resulting in suboptimal evaluation. We introduce a **Cl**inically grounded tabular framework with **E**xpert-curated labels and **A**ttribute-level comparison for **R**adiology report evaluation (**CLEAR**). CLEAR not only examines whether a report can accurately identify the presence or absence of medical conditions, but it also assesses whether the report can precisely describe each positively identified condition across five key attributes: first occurrence, change, severity, descriptive location, and recommendation. Compared with prior works, CLEAR’s multi-dimensional, attribute-level outputs enable a more comprehensive and clinically interpretable evaluation of report quality. Additionally, to measure the clinical alignment of CLEAR, we collaborated with five board-certified radiologists to develop **CLEAR-Bench**, a dataset of 100 chest radiograph reports from MIMIC-CXR, annotated across 6 curated attributes and 13 CheXpert conditions. Our experiments demonstrated that CLEAR achieves high accuracy in extracting clinical attributes and provides automated metrics that are strongly aligned with clinical judgment.