Multi-Token Completion for Text Anonymization

Pulkit Madaan; Krithika Ramesh; Lisa Bauer; Charith Peris; Anjalie Field

Multi-Token Completion for Text Anonymization

Pulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, Anjalie Field

Abstract

Text anonymization is a critical task for enabling research and development in high-stakes domains containing private data, like medicine, law, and social services. While much research has focused on redacting sensitive content from text, substantially less work has focused on what to replace redacted content with, which can enhance privacy and becomes increasingly important with greater levels of redaction. In this work, we formulate predicting replacements for sensitive spans as a research task with principled use-inspired evaluation criteria. We further propose a multi-token completion method for accomplishing this task that is designed to preserve consistency with low compute requirements, thus facilitating practitioners to anonymize data locally before sharing it externally. Human and automated annotations demonstrate that our approach produces more realistic text and better preserves utility than alternative infilling methods and differentially private mechanisms across multiple domains without retraining. Overall, our work explores the under-studied task of what to replace redacted content with and contributes grounded evaluations capturing utility, facilitating future work.

Anthology ID:: 2026.eacl-long.276
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5894–5908
Language:
URL:: https://aclanthology.org/2026.eacl-long.276/
DOI:
Bibkey:
Cite (ACL):: Pulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, and Anjalie Field. 2026. Multi-Token Completion for Text Anonymization. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5894–5908, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Multi-Token Completion for Text Anonymization (Madaan et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.276.pdf
Checklist:: 2026.eacl-long.276.checklist.pdf

PDF Cite Search Checklist Fix data