MalwareTextDB: A Database for Annotated Malware Articles

Swee Kiat Lim, Aldrian Obaja Muis, Wei Lu, Chen Hui Ong


Abstract
Cybersecurity risks and malware threats are becoming increasingly dangerous and common. Despite the severity of the problem, there has been few NLP efforts focused on tackling cybersecurity. In this paper, we discuss the construction of a new database for annotated malware texts. An annotation framework is introduced based on the MAEC vocabulary for defining malware characteristics, along with a database consisting of 39 annotated APT reports with a total of 6,819 sentences. We also use the database to construct models that can potentially help cybersecurity researchers in their data collection and analytics efforts.
Anthology ID:
P17-1143
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1557–1567
Language:
URL:
https://aclanthology.org/P17-1143
DOI:
10.18653/v1/P17-1143
Bibkey:
Cite (ACL):
Swee Kiat Lim, Aldrian Obaja Muis, Wei Lu, and Chen Hui Ong. 2017. MalwareTextDB: A Database for Annotated Malware Articles. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1557–1567, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
MalwareTextDB: A Database for Annotated Malware Articles (Lim et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1143.pdf
Note:
 P17-1143.Notes.zip
Dataset:
 P17-1143.Datasets.zip