Whodunit? Learning to Contrast for Authorship Attribution

Bo Ai; Yuchen Wang; Yugin Tan; Samson Tan

Whodunit? Learning to Contrast for Authorship Attribution

Bo Ai, Yuchen Wang, Yugin Tan, Samson Tan

Abstract

Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset’s content and style, but these approaches are dataset-dependent and yield inconsistent performance across corpora. In this work, we propose to learn author-specific representations by fine-tuning pre-trained generic language representations with a contrastive objective (Contra-X). We show that Contra-X learns representations that form highly separable clusters for different authors. It advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8% over cross-entropy fine-tuning. However, we find that Contra-X improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to integrate contrastive learning with pre-trained language model fine-tuning for authorship attribution.

Anthology ID:: 2022.aacl-main.84
Volume:: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: November
Year:: 2022
Address:: Online only
Editors:: Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:: AACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1142–1157
Language:
URL:: https://aclanthology.org/2022.aacl-main.84
DOI:
Bibkey:
Cite (ACL):: Bo Ai, Yuchen Wang, Yugin Tan, and Samson Tan. 2022. Whodunit? Learning to Contrast for Authorship Attribution. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1142–1157, Online only. Association for Computational Linguistics.
Cite (Informal):: Whodunit? Learning to Contrast for Authorship Attribution (Ai et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.aacl-main.84.pdf

PDF Cite Search