Describing Sets of Images with Textual-PCA

Oded Hupert, Idan Schwartz, Lior Wolf


Abstract
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set. Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases. First, a centroid phrase that has the largest average semantic similarity to the images in the set is generated, where both the computation of the similarity and the generation are based on pretrained vision-language models. Then, the phrase that generates the highest variation among the similarity scores is generated, using the same models. The next phrase maximizes the variance subject to being orthogonal, in the latent space, to the highest-variance phrase, and the process continues. Our experiments show that our method is able to convincingly capture the essence of image sets and describe the individual elements in a semantically meaningful way within the context of the entire set. Our code is available at: https://github.com/OdedH/textual-pca.
Anthology ID:
2022.findings-emnlp.279
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3811–3821
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.279
DOI:
10.18653/v1/2022.findings-emnlp.279
Bibkey:
Cite (ACL):
Oded Hupert, Idan Schwartz, and Lior Wolf. 2022. Describing Sets of Images with Textual-PCA. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3811–3821, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Describing Sets of Images with Textual-PCA (Hupert et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.279.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.279.mp4