Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers

Utpal Kumar Sikdar, Biswanath Barik, Björn Gambäck


Abstract
Cybersecurity risks such as malware threaten the personal safety of users, but to identify malware text is a major challenge. The paper proposes a supervised learning approach to identifying malware sentences given a document (subTask1 of SemEval 2018, Task 8), as well as to classifying malware tokens in the sentences (subTask2). The approach achieved good results, ranking second of twelve participants for both subtasks, with F-scores of 57% for subTask1 and 28% for subTask2.
Anthology ID:
S18-1144
Volume:
Proceedings of the 12th International Workshop on Semantic Evaluation
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marianna Apidianaki, Saif M. Mohammad, Jonathan May, Ekaterina Shutova, Steven Bethard, Marine Carpuat
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
890–893
Language:
URL:
https://aclanthology.org/S18-1144
DOI:
10.18653/v1/S18-1144
Bibkey:
Cite (ACL):
Utpal Kumar Sikdar, Biswanath Barik, and Björn Gambäck. 2018. Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 890–893, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers (Sikdar et al., SemEval 2018)
Copy Citation:
PDF:
https://aclanthology.org/S18-1144.pdf