Identifying and Classifying Third-party Entities in Natural Language Privacy Policies

Mitra Bokaie Hosseini, Pragyan K C, Irwin Reyes, Serge Egelman


Abstract
App developers often raise revenue by contracting with third party ad networks, which serve targeted ads to end-users. To this end, a free app may collect data about its users and share it with advertising companies for targeting purposes. Regulations such as General Data Protection Regulation (GDPR) require transparency with respect to the recipients (or categories of recipients) of user data. These regulations call for app developers to have privacy policies that disclose those third party recipients of user data. Privacy policies provide users transparency into what data an app will access, collect, shared, and retain. Given the size of app marketplaces, verifying compliance with such regulations is a tedious task. This paper aims to develop an automated approach to extract and categorize third party data recipients (i.e., entities) declared in privacy policies. We analyze 100 privacy policies associated with most downloaded apps in the Google Play Store. We crowdsource the collection and annotation of app privacy policies to establish the ground truth with respect to third party entities. From this, we train various models to extract third party entities automatically. Our best model achieves average F1 score of 66% when compared to crowdsourced annotations.
Anthology ID:
2020.privatenlp-1.3
Volume:
Proceedings of the Second Workshop on Privacy in NLP
Month:
November
Year:
2020
Address:
Online
Editors:
Oluwaseyi Feyisetan, Sepideh Ghanavati, Shervin Malmasi, Patricia Thaine
Venue:
PrivateNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–27
Language:
URL:
https://aclanthology.org/2020.privatenlp-1.3
DOI:
10.18653/v1/2020.privatenlp-1.3
Bibkey:
Cite (ACL):
Mitra Bokaie Hosseini, Pragyan K C, Irwin Reyes, and Serge Egelman. 2020. Identifying and Classifying Third-party Entities in Natural Language Privacy Policies. In Proceedings of the Second Workshop on Privacy in NLP, pages 18–27, Online. Association for Computational Linguistics.
Cite (Informal):
Identifying and Classifying Third-party Entities in Natural Language Privacy Policies (Bokaie Hosseini et al., PrivateNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.privatenlp-1.3.pdf
Video:
 https://slideslive.com/38939772