Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding

Mengyue Zhou; Xu Liu; David Liu; Zihao Wu (吴梓浩); Zhengliang Liu; Lin Zhao; Dajiang Zhu; Lei Guo; Junwei Han; Tianming Liu; Xintao Hu

doi:10.18653/v1/2023.findings-acl.503

Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding

Mengyue Zhou, Xu Liu, David Liu, Zihao Wu, Zhengliang Liu, Lin Zhao, Dajiang Zhu, Lei Guo, Junwei Han, Tianming Liu, Xintao Hu

Abstract

The Wav2Vec and its variants have achieved unprecedented success in computational auditory and speech processing. Meanwhile, neural encoding studies that integrate the superb representation capability of Wav2Vec and link those representations to brain activities have provided novel insights into a fundamental question of how auditory and speech processing unfold in the human brain. Without an explicit definition, most existing studies treat each transformer encoding layer in Wav2Vec as a single artificial neuron (AN). That is, the layer-level embeddings are used to predict neural responses. However, the comprehensive layer-level embedding aggregates multiple types of contextual attention captured by multi-head self-attention (MSA) modules. Thus, the layer-level ANs lack fine-granularity for neural encoding. To address this limitation, we define the elementary units, i.e., each hidden dimension, as neuron-level ANs in Wav2Vec2.0, quantify their temporal responses, and couple those ANs with their biological-neuron (BN) counterparts in the human brain. Our experimental results demonstrated that: 1) The proposed neuron-level ANs carry meaningful neurolinguistic information; 2) Those ANs anchor to their BN signatures; 3) The AN-BN anchoring patterns are interpretable from a neurolinguistic perspective. More importantly, our results suggest an intermediate stage in both the computational representation in Wav2Vec2.0 and the cortical representation in the brain. Our study validates the fine-grained ANs in Wav2Vec2.0, which may serve as a novel and general strategy to link transformer-based deep learning models to neural responses for probing the sensory processing in the brain.

Anthology ID:: 2023.findings-acl.503
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7943–7956
Language:
URL:: https://aclanthology.org/2023.findings-acl.503/
DOI:: 10.18653/v1/2023.findings-acl.503
Bibkey:
Cite (ACL):: Mengyue Zhou, Xu Liu, David Liu, Zihao Wu, Zhengliang Liu, Lin Zhao, Dajiang Zhu, Lei Guo, Junwei Han, Tianming Liu, and Xintao Hu. 2023. Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7943–7956, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding (Zhou et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.503.pdf

PDF Cite Search Fix data