Unsupervised Anomaly Detection in Parole Hearings using Language Models

Graham Todd; Catalin Voss; Jenny Hong

doi:10.18653/v1/2020.nlpcss-1.8

Unsupervised Anomaly Detection in Parole Hearings using Language Models

Abstract

Each year, thousands of roughly 150-page parole hearing transcripts in California go unread because legal experts lack the time to review them. Yet, reviewing transcripts is the only means of public oversight in the parole process. To assist reviewers, we present a simple unsupervised technique for using language models (LMs) to identify procedural anomalies in long-form legal text. Our technique highlights unusual passages that suggest further review could be necessary. We utilize a contrastive perplexity score to identify passages, defined as the scaled difference between its perplexities from two LMs, one fine-tuned on the target (parole) domain, and another pre-trained on out-of-domain text to normalize for grammatical or syntactic anomalies. We present quantitative analysis of the results and note that our method has identified some important cases for review. We are also excited about potential applications in unsupervised anomaly detection, and present a brief analysis of results for detecting fake TripAdvisor reviews.

Anthology ID:: 2020.nlpcss-1.8
Volume:: Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Month:: November
Year:: 2020
Address:: Online
Editors:: David Bamman, Dirk Hovy, David Jurgens, Brendan O'Connor, Svitlana Volkova
Venue:: NLP+CSS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 66–71
Language:
URL:: https://aclanthology.org/2020.nlpcss-1.8/
DOI:: 10.18653/v1/2020.nlpcss-1.8
Bibkey:
Cite (ACL):: Graham Todd, Catalin Voss, and Jenny Hong. 2020. Unsupervised Anomaly Detection in Parole Hearings using Language Models. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 66–71, Online. Association for Computational Linguistics.
Cite (Informal):: Unsupervised Anomaly Detection in Parole Hearings using Language Models (Todd et al., NLP+CSS 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.nlpcss-1.8.pdf
Optionalsupplementarymaterial:: 2020.nlpcss-1.8.OptionalSupplementaryMaterial.zip
Video:: https://slideslive.com/38940611

PDF Cite Search Optionalsupplementarymaterial Video Fix data