extractor
Rebuttal for " Revisiting the Evaluation of Image Synthesis with GANs " Anonymous Author(s) Affiliation Address email
Our presentation is organized for following reasons: In Section 2.3, we present the228 details of generative models, evaluated datasets, and analysis approaches (including our visualization229 tool, histogram matching attack, and human evaluation). They are independent of each other, thus230 we discuss them in parallel in the main paper. In Section 3.1, we investigate the feature extractors231 by first identifying their attention on visual semantics, followed by investigating their robustness to232 the histogram matching attack. Finally, we filter extractors that define similar representation spaces.233 These studies are gradually deepening, thus they are organized in a progressive manner.
Revisiting the Evaluation of Image Synthesis with GANs
A good metric, which promises a reliable comparison between solutions, is essential for any well-defined task. Unlike most vision tasks that have per-sample groundtruth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples. This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make indepth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set. Extensive experiments conducted on multiple datasets and settings reveal several important findings. Firstly, a group of models that include both CNN-based and ViT-based architectures serve as reliable and robust feature extractors for measurement evaluation. Secondly, Centered Kernel Alignment (CKA) provides a better comparison across various extractors and hierarchical layers in one model. Finally, CKA is more sampleefficient and enjoys better agreement with human judgment in characterizing the similarity between two internal data correlations. These findings contribute to the development of a new measurement system, which enables a consistent and reliable re-evaluation of current state-of-the-art generative models. 1
SupplementaryMaterial
R(h). (23) Here for simplicity, we abused the symbolD in(22)by maximizing outh0 in the originalD. In the top-left areaP,suppose only oneexample (markedbyxwith vertical coordinate1)isconfidently labeled as positive, and the rest examples are highly inconfidently labeled, hence not to contribute to the riskR. Similarly,there isonly one confidently labeled example ()inthe bottom-right area ofP, and it is negative with vertical coordinate 1. Wheneverλ > 2, the optimalhλ is in(0,1)and can be solved by a quadratic equation. In contrast,di-MDD is immune to this problem becauseRis used only to determineh, while the di-MDD value itself is solely contributed byD. Same as the scenario of largeλ, we do not change the feature distribution of source and target domains, hence keepingD(h) = 1 |h|.
ContinualLearning
However,theygenerally lose performance inmore realistic scenarios like learning in a continual manner. In contrast, humans can incorporate their prior knowledge to learn new concepts efficiently without forgetting older ones. In this work, we leverage meta-learning to encourage the model to learn how to learn continually. Inspired by human concept learning, we develop agenerative classifier that efficiently uses data-drivenexperience tolearn newconcepts even from fewsamples while being immune to forgetting. Along with cognitiveand theoretical insights, extensiveexperiments onstandard benchmarks demonstrate the effectiveness of the proposed method.