“Average” Approximates “First Principal Component”? An Empirical Analysis on Representations from Neural Language Models

Zihan Wang, Chengyu Dong, Jingbo Shang


Abstract
Contextualized representations based on neural language models have furthered the state of the art in various NLP tasks. Despite its great success, the nature of such representations remains a mystery. In this paper, we present an empirical property of these representations—”average” approximates “first principal component”. Specifically, experiments show that the average of these representations shares almost the same direction as the first principal component of the matrix whose columns are these representations. We believe this explains why the average representation is always a simple yet strong baseline. Our further examinations show that this property also holds in more challenging scenarios, for example, when the representations are from a model right after its random initialization. Therefore, we conjecture that this property is intrinsic to the distribution of representations and not necessarily related to the input structure. We realize that these representations empirically follow a normal distribution for each dimension, and by assuming this is true, we demonstrate that the empirical property can be in fact derived mathematically.
Anthology ID:
2021.emnlp-main.453
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5594–5603
Language:
URL:
https://aclanthology.org/2021.emnlp-main.453
DOI:
10.18653/v1/2021.emnlp-main.453
Bibkey:
Cite (ACL):
Zihan Wang, Chengyu Dong, and Jingbo Shang. 2021. “Average” Approximates “First Principal Component”? An Empirical Analysis on Representations from Neural Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5594–5603, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
“Average” Approximates “First Principal Component”? An Empirical Analysis on Representations from Neural Language Models (Wang et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.453.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.453.mp4
Data
KP20k