What Do Vision–Language Models Encode for Personalized Image Aesthetics Assessment?

Koki Ryu; Hitomi Yanaka

doi:10.18653/v1/2026.findings-acl.1706

What Do Vision–Language Models Encode for Personalized Image Aesthetics Assessment?

Abstract

Personalized image aesthetics assessment (PIAA) is an important research problem with practical real-world applications. While methods based on vision-language models (VLMs) are promising candidates for PIAA, it remains unclear whether they internally encode rich, multi-level aesthetic attributes required for effective personalization. In this paper, we first analyze the internal representations of VLMs to examine the presence and distribution of such aesthetic attributes, and then leverage them for lightweight, individual-level personalization without model fine-tuning. Our analysis reveals that VLMs encode diverse aesthetic attributes that propagate into the language decoder layers. Building on these representations, we demonstrate that simple linear models can achieve effective personalized image aesthetics assessment. We further analyze how aesthetic information is transferred across layers in different VLM architectures and across image domains. Our findings provide insights into how VLMs can be utilized for modeling subjective, individual aesthetic preferences.

Anthology ID:: 2026.findings-acl.1706
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34146–34167
Language:
URL:: https://aclanthology.org/2026.findings-acl.1706/
DOI:: 10.18653/v1/2026.findings-acl.1706
Bibkey:
Cite (ACL):: Koki Ryu and Hitomi Yanaka. 2026. What Do Vision–Language Models Encode for Personalized Image Aesthetics Assessment?. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34146–34167, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: What Do Vision–Language Models Encode for Personalized Image Aesthetics Assessment? (Ryu & Yanaka, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1706.pdf
Checklist:: 2026.findings-acl.1706.checklist.pdf

PDF Cite Search Checklist Fix data