Predicting Malware Attributes from Cybersecurity Texts

Arpita Roy, Youngja Park, Shimei Pan


Abstract
Text analytics is a useful tool for studying malware behavior and tracking emerging threats. The task of automated malware attribute identification based on cybersecurity texts is very challenging due to a large number of malware attribute labels and a small number of training instances. In this paper, we propose a novel feature learning method to leverage diverse knowledge sources such as small amount of human annotations, unlabeled text and specifications about malware attribute labels. Our evaluation has demonstrated the effectiveness of our method over the state-of-the-art malware attribute prediction systems.
Anthology ID:
N19-1293
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2857–2861
Language:
URL:
https://aclanthology.org/N19-1293/
DOI:
10.18653/v1/N19-1293
Bibkey:
Cite (ACL):
Arpita Roy, Youngja Park, and Shimei Pan. 2019. Predicting Malware Attributes from Cybersecurity Texts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2857–2861, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Predicting Malware Attributes from Cybersecurity Texts (Roy et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1293.pdf