Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement

Yunlong Feng, Dechuan Teng, Yang Xu, Honglin Mu, Xiao Xu, Libo Qin, Qingfu Zhu, Wanxiang Che


Abstract
Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Constructed Context Decompilation (sc2dec) method recompiles the LLM’s decompilation results to construct pairs for in-context learning, helping the model improve decompilation performance. (2) Fine-grained Alignment Enhancement (FAE), which meticulously aligns assembly code with source code at the statement level by leveraging debugging information, is employed during the fine-tuning phase to achieve further improvements in decompilation. By integrating these two methods, we achieved a Re-Executability performance improvement of approximately 3.90% on the Decompile-Eval benchmark, establishing a new state-of-the-art performance of 52.41%. The code, data, and models are available at https://github.com/AlongWY/sccdec.
Anthology ID:
2024.findings-emnlp.385
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6603–6614
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.385/
DOI:
10.18653/v1/2024.findings-emnlp.385
Bibkey:
Cite (ACL):
Yunlong Feng, Dechuan Teng, Yang Xu, Honglin Mu, Xiao Xu, Libo Qin, Qingfu Zhu, and Wanxiang Che. 2024. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6603–6614, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement (Feng et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.385.pdf
Software:
 2024.findings-emnlp.385.software.zip
Data:
 2024.findings-emnlp.385.data.zip