Behind the Mask: Demographic bias in name detection for PII masking

Courtney Mansfield, Amandalynne Paullada, Kristen Howell


Abstract
Many datasets contain personally identifiable information, or PII, which poses privacy risks to individuals. PII masking is commonly used to redact personal information such as names, addresses, and phone numbers from text data. Most modern PII masking pipelines involve machine learning algorithms. However, these systems may vary in performance, such that individuals from particular demographic groups bear a higher risk for having their personal information exposed. In this paper, we evaluate the performance of three off-the-shelf PII masking systems on name detection and redaction. We generate data using names and templates from the customer service domain. We find that an open-source RoBERTa-based system shows fewer disparities than the commercial models we test. However, all systems demonstrate significant differences in error rate based on demographics. In particular, the highest error rates occurred for names associated with Black and Asian/Pacific Islander individuals.
Anthology ID:
2022.ltedi-1.10
Volume:
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
LTEDI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–89
Language:
URL:
https://aclanthology.org/2022.ltedi-1.10
DOI:
10.18653/v1/2022.ltedi-1.10
Bibkey:
Cite (ACL):
Courtney Mansfield, Amandalynne Paullada, and Kristen Howell. 2022. Behind the Mask: Demographic bias in name detection for PII masking. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pages 76–89, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Behind the Mask: Demographic bias in name detection for PII masking (Mansfield et al., LTEDI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ltedi-1.10.pdf
Video:
 https://aclanthology.org/2022.ltedi-1.10.mp4
Code
 csmansfield/pii-masking-bias