Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Momose Oyama, Hiroaki Yamagiwa, Hidetoshi Shimodaira


Abstract
Independent Component Analysis (ICA) offers interpretable semantic components of embeddings.While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.
Anthology ID:
2024.emnlp-main.169
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2883–2899
Language:
URL:
https://aclanthology.org/2024.emnlp-main.169
DOI:
Bibkey:
Cite (ACL):
Momose Oyama, Hiroaki Yamagiwa, and Hidetoshi Shimodaira. 2024. Understanding Higher-Order Correlations Among Semantic Components in Embeddings. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2883–2899, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Understanding Higher-Order Correlations Among Semantic Components in Embeddings (Oyama et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.169.pdf