Feature-Based Forensic Text Comparison Using a Poisson Model for Likelihood Ratio Estimation

Michael Carne, Shunichi Ishihara


Abstract
Score- and feature-based methods are the two main ones for estimating a forensic likelihood ratio (LR) quantifying the strength of evidence. In this forensic text comparison (FTC) study, a score-based method using the Cosine distance is compared with a feature-based method built on a Poisson model with texts collected from 2,157 authors. Distance measures (e.g. Burrows’s Delta, Cosine distance) are a standard tool in authorship attribution studies. Thus, the implementation of a score-based method using a distance measure is naturally the first step for estimating LRs for textual evidence. However, textual data often violates the statistical assumptions underlying distance-based models. Furthermore, such models only assess the similarity, not the typicality, of the objects (i.e. documents) under comparison. A Poisson model is theoretically more appropriate than distance-based measures for authorship attribution, but it has never been tested with linguistic text evidence within the LR framework. The log-LR cost (Cllr) was used to assess the performance of the two methods. This study demonstrates that: (1) the feature-based method outperforms the score-based method by a Cllr value of ca. 0.09 under the best-performing settings and; (2) the performance of the feature-based method can be further improved by feature selection.
Anthology ID:
2020.alta-1.4
Volume:
Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2020
Address:
Virtual Workshop
Editors:
Maria Kim, Daniel Beck, Meladel Mistica
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
32–42
Language:
URL:
https://aclanthology.org/2020.alta-1.4
DOI:
Bibkey:
Cite (ACL):
Michael Carne and Shunichi Ishihara. 2020. Feature-Based Forensic Text Comparison Using a Poisson Model for Likelihood Ratio Estimation. In Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association, pages 32–42, Virtual Workshop. Australasian Language Technology Association.
Cite (Informal):
Feature-Based Forensic Text Comparison Using a Poisson Model for Likelihood Ratio Estimation (Carne & Ishihara, ALTA 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.alta-1.4.pdf