Improving Long-Text Authorship Verification via Model Selection and Data Tuning

Trang Nguyen; Charlie Dagli; Kenneth Alperin; Courtland Vandam; Elliot Singer

doi:10.18653/v1/2023.latechclfl-1.4

Improving Long-Text Authorship Verification via Model Selection and Data Tuning

Trang Nguyen, Charlie Dagli, Kenneth Alperin, Courtland Vandam, Elliot Singer

Abstract

Authorship verification is used to link texts written by the same author without needing a model per author, making it useful to deanonymizing users spreading text with malicious intent. In this work, we evaluated our Cross-Encoder system with four Transformers using differently tuned variants of fanfiction data and found that our BigBird pipeline outperformed Longformer, RoBERTa, and ELECTRA and performed competitively against the official top ranked system from the PAN evaluation. We also examined the effect of authors and fandoms not seen in training on model performance. Through this, we found fandom has the greatest influence on true trials, and that a balanced training dataset in terms of class and fandom performed the most consistently.

Anthology ID:: 2023.latechclfl-1.4
Volume:: Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:: LaTeCHCLfL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28–37
Language:
URL:: https://aclanthology.org/2023.latechclfl-1.4
DOI:: 10.18653/v1/2023.latechclfl-1.4
Bibkey:
Cite (ACL):: Trang Nguyen, Charlie Dagli, Kenneth Alperin, Courtland Vandam, and Elliot Singer. 2023. Improving Long-Text Authorship Verification via Model Selection and Data Tuning. In Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 28–37, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Improving Long-Text Authorship Verification via Model Selection and Data Tuning (Nguyen et al., LaTeCHCLfL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.latechclfl-1.4.pdf
Video:: https://aclanthology.org/2023.latechclfl-1.4.mp4

PDF Cite Search Video