WEKA in Forensic Authorship Analysis: A corpus-based approach of Saudi Authors

Mashael AlAmr, Eric Atwell


Abstract
This is a pilot study that aims to explore the potential of using WEKA in forensic authorship analysis. It is a corpus-based research using data from Twitter collected from thirteen authors from Riyadh, Saudi Arabia. It examines the performance of unbalanced and balanced data sets using different classifiers and parameters of word grams. The attributes are dialect-specific linguistic features categorized as word grams. The findings further support previous studies in computational authorship identification.
Anthology ID:
2020.icon-main.34
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
257–260
Language:
URL:
https://aclanthology.org/2020.icon-main.34
DOI:
Bibkey:
Cite (ACL):
Mashael AlAmr and Eric Atwell. 2020. WEKA in Forensic Authorship Analysis: A corpus-based approach of Saudi Authors. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 257–260, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
WEKA in Forensic Authorship Analysis: A corpus-based approach of Saudi Authors (AlAmr & Atwell, ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.34.pdf