LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives

Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer, Marzieh Fadaee, Sara Hooker


Abstract
The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs). To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying how the source of synthetic data shapes models’ internal biases, calibration and preferences, and their generations’ textual attributes, providing one of the most comprehensive studies to-date. We find that models are surprisingly sensitive towards certain attributes even when the synthetic data prompts appear “neutral” which invites the question: can we explicitly steer the distilled data towards desired properties? We demonstrate how such active inheritance can steer the generation profiles of models towards desirable non-differentiable attributes in both directions, e.g. increasing lexical diversity or reducing toxicity. Overall, our study broadens the understanding of the implicit biases inherited by LLMs and explores how we can leverage them to positive effect.
Anthology ID:
2024.emnlp-main.521
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9243–9267
Language:
URL:
https://aclanthology.org/2024.emnlp-main.521
DOI:
10.18653/v1/2024.emnlp-main.521
Bibkey:
Cite (ACL):
Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer, Marzieh Fadaee, and Sara Hooker. 2024. LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9243–9267, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives (Shimabucoro et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.521.pdf