CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Xing Ma; Yangjie Zhou; Wu Sun; Zihan Liu; Jingwen Leng; Yun Lin; Shixuan Sun; Minyi Guo; Jin Song Dong

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng, Yun Lin, Shixuan Sun, Minyi Guo, Jin Song Dong

Abstract

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention.We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift–transfer–lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.

Anthology ID:: 2026.acl-long.500
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10929–10946
Language:
URL:: https://aclanthology.org/2026.acl-long.500/
DOI:
Bibkey:
Cite (ACL):: Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng, Yun Lin, Shixuan Sun, Minyi Guo, and Jin Song Dong. 2026. CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10929–10946, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.500.pdf
Checklist:: 2026.acl-long.500.checklist.pdf

PDF Cite Search Checklist Fix data