Felix Matthias Saaro


2026

Evaluating attribute control success in controllable text generation and related generation tasks typically relies on pretrained classifiers. We show that this widely used classify-and-count approach yields biased and inconsistent results, with estimates varying significantly across classifiers. We frame control success estimation as a quantification task and apply a hybrid Bayesian method that combines classifier predictions with a small number of human labels for calibration. To test our approach, we collected a two-modality test dataset consisting of 600 human-rated samples and 60,000 automatically rated samples. Our experiments show that our approach produces robust estimates of control success across both text and text-to-image generation tasks, offering a principled alternative to current evaluation practices.