Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT

Crispin Almodovar; Fariza Sabrina; Sarvnaz Karimi; Salahuddin Azad

Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT

Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, Salahuddin Azad

Abstract

The log files generated by networked computer systems contain valuable information that can be used to monitor system security and stability. Recently, techniques based on Deep Learning and Natural Language Processing have been proven effective in detecting anomalous activities from system logs. The current approaches, however, have limited practical application because they rely on log templates which cannot handle variability in log content, or they require supervised training to be effective. In this paper, a novel log anomaly detection approach named LogFiT is proposed. The LogFiT model inherits the linguistic “knowledge” encoded within a pretrained BERT-based language model and fine-tunes it towards learning the linguistic structure of system logs. The LogFiT model is trained in a self-supervised manner using normal log data only. Using masked token prediction and centroid distance minimisation as training objectives, the LogFiT model learns to recognise the linguistic patterns associated with the normal log data. During inference, a discriminator function uses the LogFiT model’s top-k token prediction accuracy and computed centroid distance to determine if the input is normal or anomaly. Experiments show that LogFiT’s F1 score and specificity exceeds that of baseline models on the HDFS dataset and comparable on the BGL dataset.

Anthology ID:: 2022.alta-1.19
Volume:: Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association
Month:: December
Year:: 2022
Address:: Adelaide, Australia
Editors:: Pradeesh Parameswaran, Jennifer Biggs, David Powers
Venue:: ALTA
SIG:
Publisher:: Australasian Language Technology Association
Note:
Pages:: 139–147
Language:
URL:: https://aclanthology.org/2022.alta-1.19
DOI:
Bibkey:
Cite (ACL):: Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, and Salahuddin Azad. 2022. Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT. In Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association, pages 139–147, Adelaide, Australia. Australasian Language Technology Association.
Cite (Informal):: Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT (Almodovar et al., ALTA 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.alta-1.19.pdf

PDF Cite Search