An Entity Resolution Approach to Isolate Instances of Human Trafficking Online

Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski


Abstract
Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as ‘escort advertisements’ on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities.
Anthology ID:
W17-4411
Volume:
Proceedings of the 3rd Workshop on Noisy User-generated Text
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Leon Derczynski, Wei Xu, Alan Ritter, Tim Baldwin
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
77–84
Language:
URL:
https://aclanthology.org/W17-4411
DOI:
10.18653/v1/W17-4411
Bibkey:
Cite (ACL):
Chirag Nagpal, Kyle Miller, Benedikt Boecking, and Artur Dubrawski. 2017. An Entity Resolution Approach to Isolate Instances of Human Trafficking Online. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 77–84, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
An Entity Resolution Approach to Isolate Instances of Human Trafficking Online (Nagpal et al., WNUT 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4411.pdf