Interpretable Identification of Cybersecurity Vulnerabilities from News Articles

Pierre Frode de la Foret, Stefan Ruseti, Cristian Sandescu, Mihai Dascalu, Sebastien Travadel


Abstract
With the increasing adoption of technology, more and more systems become target to information security breaches. In terms of readily identifying zero-day vulnerabilities, a substantial number of news outlets and social media accounts reveal emerging vulnerabilities and threats. However, analysts often spend a lot of time looking through these decentralized sources of information in order to ensure up-to-date countermeasures and patches applicable to their organisation’s information systems. Various automated processing pipelines grounded in Natural Language Processing techniques for text classification were introduced for the early identification of vulnerabilities starting from Open-Source Intelligence (OSINT) data, including news websites, blogs, and social media. In this study, we consider a corpus of more than 1600 labeled news articles, and introduce an interpretable approach to the subject of cyberthreat early detection. In particular, an interpretable classification is performed using the Longformer architecture alongside prototypes from the ProSeNet structure, after performing a preliminary analysis on the Transformer’s encoding capabilities. The best interpretable architecture achieves an 88% F2-Score, arguing for the system’s applicability in real-life monitoring conditions of OSINT data.
Anthology ID:
2021.ranlp-1.49
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
428–436
Language:
URL:
https://aclanthology.org/2021.ranlp-1.49
DOI:
Bibkey:
Cite (ACL):
Pierre Frode de la Foret, Stefan Ruseti, Cristian Sandescu, Mihai Dascalu, and Sebastien Travadel. 2021. Interpretable Identification of Cybersecurity Vulnerabilities from News Articles. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 428–436, Held Online. INCOMA Ltd..
Cite (Informal):
Interpretable Identification of Cybersecurity Vulnerabilities from News Articles (Frode de la Foret et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.49.pdf