A Pretrained Language Model for Cyber Threat Intelligence

Youngja Park, Weiqiu You


Abstract
We present a new BERT model for the cybersecurity domain, CTI-BERT, which can improve the accuracy of cyber threat intelligence (CTI) extraction, enabling organizations to better defend against potential cyber threats. We provide detailed information about the domain corpus collection, the training methodology and its effectiveness for a variety of NLP tasks for the cybersecurity domain. The experiments show that CTI-BERT significantly outperforms several general-domain and security-domain models for these cybersecurity applications indicating that the training data and methodology have a significant impact on the model performance.
Anthology ID:
2023.emnlp-industry.12
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–122
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.12
DOI:
10.18653/v1/2023.emnlp-industry.12
Bibkey:
Cite (ACL):
Youngja Park and Weiqiu You. 2023. A Pretrained Language Model for Cyber Threat Intelligence. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 113–122, Singapore. Association for Computational Linguistics.
Cite (Informal):
A Pretrained Language Model for Cyber Threat Intelligence (Park & You, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.12.pdf