Palanivel Kodeswaran
2025
Transforming Code Understanding: Clustering-Based Retrieval for Improved Summarization in Domain-Specific Languages
Baban Gain
|
Dibyanayan Bandyopadhyay
|
Samrat Mukherjee
|
Aryan Sahoo
|
Saswati Dana
|
Palanivel Kodeswaran
|
Sayandeep Sen
|
Asif Ekbal
|
Dinesh Garg
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
A domain-specific extension of C language known as extended Berkeley Packet Filter (eBPF) has gained widespread acceptance for various tasks, including observability, security, and network acceleration in the cloud community. Due to its recency and complexity, there is an overwhelming need for natural language summaries of existing eBPF codes (particularly open-source code) for practitioners and developers, which will go a long way in easing the understanding and development of new code. However, being a niche Domain-Specific Language (DSL), there is a scarcity of available training data. In this paper, we investigate the effectiveness of LLMs for summarizing low-resource DSLs, in the context of eBPF codes. Specifically, we propose a clustering-based technique to retrieve in-context examples that are semantically closer to the test example and propose a very simple yet powerful prompt design that yields superior-quality code summary generation. Experimental results show that our proposed retrieval approach for prompt generation improves the eBPF code summarization accuracy up to 12.9 BLEU points over other prompting techniques. The codes are available at https://github.com/babangain/ebpf_summ.
Search
Fix data
Co-authors
- Dibyanayan Bandyopadhyay 1
- Saswati Dana 1
- Asif Ekbal 1
- Baban Gain 1
- Dinesh Garg 1
- show all...