Towards Open-Domain Twitter User Profile Inference

Haoyang Wen, Zhenxin Xiao, Eduard Hovy, Alexander Hauptmann


Abstract
Twitter user profile inference utilizes information from Twitter to predict user attributes (e.g., occupation, location), which is controversial because of its usefulness for downstream applications and its potential to reveal users’ privacy. Therefore, it is important for researchers to determine the extent of profiling in a safe environment to facilitate proper use and make the public aware of the potential risks. Contrary to existing approaches on limited attributes, we explore open-domain Twitter user profile inference. We conduct a case study where we collect publicly available WikiData public figure profiles and use diverse WikiData predicates for profile inference. After removing sensitive attributes, our data contains over 150K public figure profiles from WikiData, over 50 different attribute predicates, and over 700K attribute values. We further propose a prompt-based generation method, which can infer values that are implicitly mentioned in the Twitter information. Experimental results show that the generation-based approach can infer more comprehensive user profiles than baseline extraction-based methods, but limitations still remain to be applied for real-world use. We also enclose a detailed ethical statement for our data, potential benefits and risks from this work, and our efforts to mitigate the risks.
Anthology ID:
2023.findings-acl.198
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3172–3188
Language:
URL:
https://aclanthology.org/2023.findings-acl.198
DOI:
10.18653/v1/2023.findings-acl.198
Bibkey:
Cite (ACL):
Haoyang Wen, Zhenxin Xiao, Eduard Hovy, and Alexander Hauptmann. 2023. Towards Open-Domain Twitter User Profile Inference. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3172–3188, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Towards Open-Domain Twitter User Profile Inference (Wen et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.198.pdf