Ning Liu
2024
Enhancing Learning-Based Binary Code Similarity Detection Model through Adversarial Training with Multiple Function Variants
Lichen Jia
|
Chenggang Wu
|
Bowen Tang
|
Peihua Zhang
|
Zihan Jiang
|
Yang Yang
|
Ning Liu
|
Jingfeng Zhang
|
Zhe Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
Compared to identifying binary versions of the same function under different compilation options, existing Learning-Based Binary Code Similarity Detection (LB-BCSD) methods exhibit lower accuracy in recognizing functions with the same functionality but different implementations. To address this issue, we introduces an adversarial attack method called FuncFooler, which focuses on perturbing critical code to generate multiple variants of the same function. These variants are then used to retrain the model to enhance its robustness. Current adversarial attacks against LB-BCSD mainly draw inspiration from the FGSM (Fast Gradient Sign Method) method in the image domain, which involves generating adversarial bytes and appending them to the end of the executable file. However, this approach has a significant drawback: the appended bytes do not affect the actual code of the executable file, thus failing to create diverse code variants. To overcome this limitation, we proposes a gradient-guided adversarial attack method based on critical code—FuncFooler. This method designs a series of strategies to perturb the code while preserving the program’s semantics. Specifically, we first utilizes gradient information to locate critical nodes in the control flow graph. Then, fine-grained perturbations are applied to these nodes, including control flow, data flow, and internal node perturbations, to obtain adversarial samples. The experimental results show that the application of the FuncFooler method can increase the accuracy of the latest LB-BCSD model by 5%-7%.
2022
Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization
Sanjeev Kumar Karn
|
Ning Liu
|
Hinrich Schuetze
|
Oladimeji Farri
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The IMPRESSIONS section of a radiology report about an imaging study is a summary of the radiologist’s reasoning and conclusions, and it also aids the referring physician in confirming or excluding certain diagnoses. A cascade of tasks are required to automatically generate an abstractive summary of the typical information-rich radiology report. These tasks include acquisition of salient content from the report and generation of a concise, easily consumable IMPRESSIONS section. Prior research on radiology report summarization has focused on single-step end-to-end models – which subsume the task of salient content acquisition. To fully explore the cascade structure and explainability of radiology report summarization, we introduce two innovations. First, we design a two-step approach: extractive summarization followed by abstractive summarization. Second, we additionally break down the extractive part into two independent tasks: extraction of salient (1) sentences and (2) keywords. Experiments on English radiology reports from two clinical sites show our novel approach leads to a more precise summary compared to single-step and to two-step-with-single-extractive-process baselines with an overall improvement in F1 score of 3-4%.
Search
Fix data
Co-authors
- Oladimeji Farri 1
- Lichen Jia 1
- Zihan Jiang 1
- Sanjeev Kumar Karn 1
- Hinrich Schütze 1
- show all...