Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification

Jan Rygl; Aleš Horák

Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification

Abstract

In the authorship identification task, examples of short writings of N authors and an anonymous document written by one of these N authors are given. The task is to determine the authorship of the anonymous text. Practically all approaches solved this problem with machine learning methods. The input attributes for the machine learning process are usually formed by stylistic or grammatical properties of individual documents or a defined similarity between a document and an author. In this paper, we present the results of an experiment to extend the machine learning attributes by ranking the similarity between a document and an author: we transform the similarity between an unknown document and one of the N authors to the order in which the author is the most similar to the document in the set of N authors. The comparison of similarity probability and similarity ranking was made using the Support Vector Machines algorithm. The results show that machine learning methods perform slightly better with attributes based on the ranking of similarity than with previously used similarity between an author and a document.

Anthology ID:: L12-1354
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 726–729
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/618_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Jan Rygl and Aleš Horák. 2012. Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 726–729, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification (Rygl & Horák, LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/618_Paper.pdf

PDF Cite Search Fix data