Eunji Kim
2024
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Eunji Kim
|
Kyuhong Shim
|
Simyung Chang
|
Sungroh Yoon
Findings of the Association for Computational Linguistics: EMNLP 2024
A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite the varying significance of different textual elements within a sentence depending on the context, efforts to account for variation of importance in constructing text embeddings have been lacking. We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI), which incorporates controllability as well. SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance, enabling finer control over emphasis responsive to data-driven insights and user preferences. The efficacy of SToRI is demonstrated through comprehensive experiments on few-shot image classification and image retrieval tailored to user preferences.
2020
Interpretation of NLP models through input marginalization
Siwon Kim
|
Jihun Yi
|
Eunji Kim
|
Sungroh Yoon
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
To demystify the “black box” property of deep neural networks for natural language processing (NLP), several methods have been proposed to interpret their predictions by measuring the change in prediction probability after erasing each token of an input. Since existing methods replace each token with a predefined value (i.e., zero), the resulting sentence lies out of the training data distribution, yielding misleading interpretations. In this study, we raise the out-of-distribution problem induced by the existing interpretation methods and present a remedy; we propose to marginalize each token out. We interpret various NLP models trained for sentiment analysis and natural language inference using the proposed method.