Towards Reverse Engineering of Language Models: A Survey

Xinpeng Ti; Wentao Ye; Zhifang Zhang; Junbo Zhao; Chang Yao; Lei Feng; Haobo Wang

Towards Reverse Engineering of Language Models: A Survey

Xinpeng Ti, Wentao Ye, Zhifang Zhang, Junbo Zhao, Chang Yao, Lei Feng, Haobo Wang

Abstract

With the continuous development of language models and the widespread availability of various types of accessible interfaces, large language models (LLMs) have been applied to an increasing number of fields. However, due to the vast amounts of data and computational resources required for model development, protecting the model’s parameters and training data has become an urgent and crucial concern. Due to the revolutionary training and application paradigms of LLMs, many new attacks on language models have emerged in recent years. In this paper, we define these attacks as “reverse engineering” (RE) techniques on LMs and aim to provide an in-depth analysis of reverse engineering of language models. We illustrate various methods of reverse engineering applied to different aspects of a model, while also providing an introduction to existing protective strategies. On the one hand, it demonstrates the vulnerabilities of even black box models to different types of attacks; on the other hand, it offers a more holistic perspective for the development of new protective strategies for models.

Anthology ID:: 2025.findings-emnlp.395
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7483–7502
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.395/
DOI:
Bibkey:
Cite (ACL):: Xinpeng Ti, Wentao Ye, Zhifang Zhang, Junbo Zhao, Chang Yao, Lei Feng, and Haobo Wang. 2025. Towards Reverse Engineering of Language Models: A Survey. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7483–7502, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Towards Reverse Engineering of Language Models: A Survey (Ti et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.395.pdf
Checklist:: 2025.findings-emnlp.395.checklist.pdf

PDF Cite Search Checklist Fix data