Almost Clinical: Linguistic properties of synthetic electronic health records

Serge Sharoff, John Baker, Dr David Francis Hunt, Alan Simpson


Abstract
This study evaluates the linguistic and clinical suitability of synthetic electronic health records in mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we examine expressions of agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) with the aim to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures. The results show both the potential and limitations of synthetic corpora for enabling large-scale linguistic research otherwise impossible with genuine patient records.
Anthology ID:
2026.healing-1.10
Volume:
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Danilova, Murathan Kurfalı, Ylva Söderfeldt, Julia Reed, Andrew Burchell
Venues:
HeaLing | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–126
Language:
URL:
https://aclanthology.org/2026.healing-1.10/
DOI:
Bibkey:
Cite (ACL):
Serge Sharoff, John Baker, Dr David Francis Hunt, and Alan Simpson. 2026. Almost Clinical: Linguistic properties of synthetic electronic health records. In Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026), pages 115–126, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Almost Clinical: Linguistic properties of synthetic electronic health records (Sharoff et al., HeaLing 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.healing-1.10.pdf