Towards Task-Agnostic Privacy- and Utility-Preserving Models

Yaroslav Emelyanov


Abstract
Modern deep learning models for natural language processing rely heavily on large amounts of annotated texts. However, obtaining such texts may be difficult when they contain personal or confidential information, for example, in health or legal domains. In this work, we propose a method of de-identifying free-form text documents by carefully redacting sensitive data in them. We show that our method preserves data utility for text classification, sequence labeling and question answering tasks.
Anthology ID:
2021.ranlp-1.45
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
394–401
Language:
URL:
https://aclanthology.org/2021.ranlp-1.45
DOI:
Bibkey:
Cite (ACL):
Yaroslav Emelyanov. 2021. Towards Task-Agnostic Privacy- and Utility-Preserving Models. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 394–401, Held Online. INCOMA Ltd..
Cite (Informal):
Towards Task-Agnostic Privacy- and Utility-Preserving Models (Emelyanov, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.45.pdf
Data
SQuAD