Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang, Meng Wang


Abstract
Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability — how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.
Anthology ID:
2024.findings-acl.60
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1012–1037
Language:
URL:
https://aclanthology.org/2024.findings-acl.60
DOI:
Bibkey:
Cite (ACL):
Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang, and Meng Wang. 2024. Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers. In Findings of the Association for Computational Linguistics ACL 2024, pages 1012–1037, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers (Pan et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.60.pdf