Hieu Man Duc Trong


2020

pdf bib
Introducing a New Dataset for Event Detection in Cybersecurity Texts
Hieu Man Duc Trong | Duc Trong Le | Amir Pouran Ben Veyseh | Thuat Nguyen | Thien Huu Nguyen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Detecting cybersecurity events is necessary to keep us informed about the fast growing number of such events reported in text. In this work, we focus on the task of event detection (ED) to identify event trigger words for the cybersecurity domain. In particular, to facilitate the future research, we introduce a new dataset for this problem, characterizing the manual annotation for 30 important cybersecurity event types and a large dataset size to develop deep learning models. Comparing to the prior datasets for this task, our dataset involves more event types and supports the modeling of document-level information to improve the performance. We perform extensive evaluation with the current state-of-the-art methods for ED on the proposed dataset. Our experiments reveal the challenges of cybersecurity ED and present many research opportunities in this area for the future work.