MALTO at SemEval-2025 Task 4: Dual Teachers for Unlearning Sensitive Content in LLMs

Claudio Savelli; Evren Munis; Erfan Bayat; Andrea Grieco; Flavio Giobergia

MALTO at SemEval-2025 Task 4: Dual Teachers for Unlearning Sensitive Content in LLMs

Claudio Savelli, Evren Munis, Erfan Bayat, Andrea Grieco, Flavio Giobergia

Abstract

Large language models (LLMs) may retain and reproduce sensitive information learned during training, posing significant privacy and ethical concerns. Once detected, this personal information should be deleted from the model. A naive answer could be to retrain these models from scratch when needed. However, this solution is unfeasible given the immense computational, economic, and environmental costs required to train these models. For this reason, Machine Unlearning (MU) has risen in recent years as an emerging field of research to efficiently delete specific information from a model’s knowledge. This paper presents our solution to the “Unlearning sensitive content from Large Language Models” shared task at SemEval-2025, which challenges researchers to develop effective LLM MU techniques. We adopt a Dual-Teacher framework that leverages a Competent and an Incompetent Teacher to erase unwanted information while selectively preserving model utility. Our approach adapts established computer vision unlearning methods to the sequential nature of language models through KL divergence minimization over next-token prediction probabilities. Our experimental results demonstrate that our method outperforms the state-of-the-art techniques.

Anthology ID:: 2025.semeval-1.229
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1747–1752
Language:
URL:: https://aclanthology.org/2025.semeval-1.229/
DOI:
Bibkey:
Cite (ACL):: Claudio Savelli, Evren Munis, Erfan Bayat, Andrea Grieco, and Flavio Giobergia. 2025. MALTO at SemEval-2025 Task 4: Dual Teachers for Unlearning Sensitive Content in LLMs. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 1747–1752, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MALTO at SemEval-2025 Task 4: Dual Teachers for Unlearning Sensitive Content in LLMs (Savelli et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.229.pdf

PDF Cite Search Fix data