Analyzing Encoded Concepts in Transformer Language Models

Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Khan, Jia Xu


Abstract
We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained lan-guage models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation) and linguistic ontologies (e.g. Word-Net), whereas the core-linguistic concepts (e.g., morphology, syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts.
Anthology ID:
2022.naacl-main.225
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3082–3101
Language:
URL:
https://aclanthology.org/2022.naacl-main.225
DOI:
10.18653/v1/2022.naacl-main.225
Bibkey:
Cite (ACL):
Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Khan, and Jia Xu. 2022. Analyzing Encoded Concepts in Transformer Language Models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3082–3101, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Analyzing Encoded Concepts in Transformer Language Models (Sajjad et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.225.pdf
Software:
 2022.naacl-main.225.software.zip
Code
 hsajjad/conceptx