Named Entity Recognition in Information Security Domain for Russian

Anastasiia Sirotina, Natalia Loukachevitch


Abstract
In this paper we discuss the named entity recognition task for Russian texts related to cybersecurity. First of all, we describe the problems that arise in course of labeling unstructured texts from information security domain. We introduce guidelines for human annotators, according to which a corpus has been marked up. Then, a CRF-based system and different neural architectures have been implemented and applied to the corpus. The named entity recognition systems have been evaluated and compared to determine the most efficient one.
Anthology ID:
R19-1128
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1114–1120
Language:
URL:
https://aclanthology.org/R19-1128/
DOI:
10.26615/978-954-452-056-4_128
Bibkey:
Cite (ACL):
Anastasiia Sirotina and Natalia Loukachevitch. 2019. Named Entity Recognition in Information Security Domain for Russian. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1114–1120, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Named Entity Recognition in Information Security Domain for Russian (Sirotina & Loukachevitch, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1128.pdf