Email Threat Detection Using Distinct Neural Network Approaches

Esteban Castillo, Sreekar Dhaduvai, Peng Liu, Kartik-Singh Thakur, Adam Dalton, Tomek Strzalkowski


Abstract
This paper describes different approaches to detect malicious content in email interactions through a combination of machine learning and natural language processing tools. Specifically, several neural network designs are tested on word embedding representations to detect suspicious messages and separate them from non-suspicious, benign email. The proposed approaches are trained and tested on distinct email collections, including datasets constructed from publicly available corpora (such as Enron, APWG, etc.) as well as several smaller, non-public datasets used in recent government evaluations. Experimental results show that back-propagation both with and without recurrent neural layers outperforms current state of the art techniques that include supervised learning algorithms with stylometric elements of texts as features. Our results also demonstrate that word embedding vectors are effective means for capturing certain aspects of text meaning that can be teased out through machine learning in non-linear/complex neural networks, in order to obtain highly accurate detection of malicious emails based on email text alone.
Anthology ID:
2020.stoc-1.8
Volume:
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | STOC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
48–55
Language:
English
URL:
https://aclanthology.org/2020.stoc-1.8
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.stoc-1.8.pdf