A Call for Clarity in Contemporary Authorship Attribution Evaluation

Allen Riddell, Haining Wang, Patrick Juola


Abstract
Recent research has documented that results reported in frequently-cited authorship attribution papers are difficult to reproduce. Inaccessible code and data are often proposed as factors which block successful reproductions. Even when original materials are available, problems remain which prevent researchers from comparing the effectiveness of different methods. To solve the remaining problems—the lack of fixed test sets and the use of inappropriately homogeneous corpora—our paper contributes materials for five closed-set authorship identification experiments. The five experiments feature texts from 106 distinct authors. Experiments involve a range of contemporary non-fiction American English prose. These experiments provide the foundation for comparable and reproducible authorship attribution research involving contemporary writing.
Anthology ID:
2021.ranlp-1.132
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1174–1179
Language:
URL:
https://aclanthology.org/2021.ranlp-1.132
DOI:
Bibkey:
Cite (ACL):
Allen Riddell, Haining Wang, and Patrick Juola. 2021. A Call for Clarity in Contemporary Authorship Attribution Evaluation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1174–1179, Held Online. INCOMA Ltd..
Cite (Informal):
A Call for Clarity in Contemporary Authorship Attribution Evaluation (Riddell et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.132.pdf