How Private are Language Models in Abstractive Summarization?

Anthony Hughes; Nikolaos Aletras; Ning Ma

How Private are Language Models in Abstractive Summarization?

Anthony Hughes, Nikolaos Aletras, Ning Ma

Abstract

In sensitive domains such as medical and legal, protecting sensitive information is critical, with protective laws strictly prohibiting the disclosure of personal data. This poses challenges for sharing valuable data such as medical reports and legal cases summaries. While language models (LMs) have shown strong performance in text summarization, it is still an open question to what extent they can provide privacy-preserving summaries from non-private source documents. In this paper, we perform a comprehensive study of privacy risks in LM-based summarization across two closed- and four open-weight models of different sizes and families. We experiment with both prompting and fine-tuning strategies for privacy-preservation across a range of summarization datasets including medical and legal domains. Our quantitative and qualitative analysis, including human evaluation, shows that LMs frequently leak personally identifiable information in their summaries, in contrast to human-generated privacy-preserving summaries, which demonstrate significantly higher privacy protection levels. These findings highlight a substantial gap between current LM capabilities and expert human expert performance in privacy-sensitive summarization tasks.

Anthology ID:: 2025.emnlp-main.1531
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30100–30118
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1531/
DOI:
Bibkey:
Cite (ACL):: Anthony Hughes, Nikolaos Aletras, and Ning Ma. 2025. How Private are Language Models in Abstractive Summarization?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30100–30118, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: How Private are Language Models in Abstractive Summarization? (Hughes et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1531.pdf
Checklist:: 2025.emnlp-main.1531.checklist.pdf

PDF Cite Search Checklist Fix data