Modelling the Relative Contributions of Stylistic Features in Forensic Authorship Attribution

G. Çağatay Sat; John Blake; Evgeny Pyshkin

Modelling the Relative Contributions of Stylistic Features in Forensic Authorship Attribution

G. Çağatay Sat, John Blake, Evgeny Pyshkin

Abstract

This paper explores the extent to which stylistic features contribute to the task of authorship attribution in forensic contexts. Drawing on a filtered subset of the Enron email corpus, the study operationalizes stylistic indicators across four groups: lexical, syntactic, orthographic, and discoursal. Using R Programming Language for feature engineering and logistic regression modelling, we systematically assessed both the individual and interactive effects of these features on attribution accuracy. Results show that n-gram similarity consistently outperformed all other features, with the combined model of n-gram similarity and its interaction with other features achieving accuracy, precision and F1 scores of 91.6%, 93.3% and 91.7% respectively. The model was subsequently evaluated on a subset of the TEL corpus to assess its applicability in a forensic setting. The findings highlight the dominant role of lexical similarity and suggest that integrating interaction effects can yield further performance gains in forensic authorship analysis.

Anthology ID:: 2025.ranlp-1.123
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 1066–1073
Language:
URL:: https://aclanthology.org/2025.ranlp-1.123/
DOI:
Bibkey:
Cite (ACL):: G. Çağatay Sat, John Blake, and Evgeny Pyshkin. 2025. Modelling the Relative Contributions of Stylistic Features in Forensic Authorship Attribution. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1066–1073, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Modelling the Relative Contributions of Stylistic Features in Forensic Authorship Attribution (Sat et al., RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.123.pdf

PDF Cite Search Fix data