Sreekar Dhaduvai


2020

pdf bib
Active Defense Against Social Engineering: The Case for Human Language Technology
Adam Dalton | Ehsan Aghaei | Ehab Al-Shaer | Archna Bhatia | Esteban Castillo | Zhuo Cheng | Sreekar Dhaduvai | Qi Duan | Bryanna Hebenstreit | Md Mazharul Islam | Younes Karimi | Amir Masoumzadeh | Brodie Mather | Sashank Santhanam | Samira Shaikh | Alan Zemel | Tomek Strzalkowski | Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

We describe a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. The system processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the system is that it uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker’s time and resources.

pdf bib
Email Threat Detection Using Distinct Neural Network Approaches
Esteban Castillo | Sreekar Dhaduvai | Peng Liu | Kartik-Singh Thakur | Adam Dalton | Tomek Strzalkowski
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

This paper describes different approaches to detect malicious content in email interactions through a combination of machine learning and natural language processing tools. Specifically, several neural network designs are tested on word embedding representations to detect suspicious messages and separate them from non-suspicious, benign email. The proposed approaches are trained and tested on distinct email collections, including datasets constructed from publicly available corpora (such as Enron, APWG, etc.) as well as several smaller, non-public datasets used in recent government evaluations. Experimental results show that back-propagation both with and without recurrent neural layers outperforms current state of the art techniques that include supervised learning algorithms with stylometric elements of texts as features. Our results also demonstrate that word embedding vectors are effective means for capturing certain aspects of text meaning that can be teased out through machine learning in non-linear/complex neural networks, in order to obtain highly accurate detection of malicious emails based on email text alone.