Applying Contrastive Learning to Code Vulnerability Type Classification

Chen Ji, Su Yang, Hongyu Sun, Yuqing Zhang


Abstract
Vulnerability classification is a crucial task in software security analysis, essential for identifying and mitigating potential security risks. Learning-based methods often perform poorly due to the long-tail distribution of vulnerability classification datasets. Recent approaches try to address the problem but treat each CWE class in isolation, ignoring their relationships. This results in non-scalable code vector representations, causing significant performance drops when handling complex real-world vulnerabilities. We propose a hierarchical contrastive learning framework for code vulnerability type classification to bring vector representations of related CWEs closer together. To address the issue of class collapse and enhance model robustness, we mix self-supervised contrastive learning loss into our loss function. Additionally, we employ max-pooling to enable the model to handle longer vulnerability code inputs. Extensive experiments demonstrate that our proposed framework outperforms state-of-the-art methods by 2.97%-17.90% on accuracy and 0.98%-22.27% on weighted-F1, with even better performance on higher-quality datasets. We also utilize an ablation study to prove each component’s contribution. These findings underscore the potential and advantages of our approach in the multi-class vulnerability classification task.
Anthology ID:
2024.emnlp-main.666
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11942–11952
Language:
URL:
https://aclanthology.org/2024.emnlp-main.666
DOI:
Bibkey:
Cite (ACL):
Chen Ji, Su Yang, Hongyu Sun, and Yuqing Zhang. 2024. Applying Contrastive Learning to Code Vulnerability Type Classification. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11942–11952, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Applying Contrastive Learning to Code Vulnerability Type Classification (Ji et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.666.pdf