A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Keke Lian; Wang Bin; Lei Zhang; Libo Chen; Junjie Wang; Ziming Zhao; Yujiu Yang; Miaoqian Lin; Haotong Duan; Haoran Zhao; Shuang Liao; Mingda Guo; Quan Jiazheng; Yilu Zhong; Chenhao He; Chen Zichuan; Jie Wu; Haoling Li; Zhaoxuan Li; Jiongchi Yu; Hui LI; Dong Zhang

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Keke Lian, Wang Bin, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Quan Jiazheng, Yilu Zhong, Chenhao He, Chen Zichuan, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui LI, Dong Zhang

Abstract

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Moreover, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation and help developers identify the most suitable models for practical tasks. They also lay the groundwork for refining LLMs to generate secure and efficient code in real-world applications.

Anthology ID:: 2026.findings-acl.1569
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31390–31405
Language:
URL:: https://aclanthology.org/2026.findings-acl.1569/
DOI:
Bibkey:
Cite (ACL):: Keke Lian, Wang Bin, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Quan Jiazheng, Yilu Zhong, Chenhao He, Chen Zichuan, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui LI, and Dong Zhang. 2026. A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31390–31405, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code (Lian et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1569.pdf
Checklist:: 2026.findings-acl.1569.checklist.pdf

PDF Cite Search Checklist Fix data