architectures
–Neural Information Processing Systems
A.1 Face experiments For the encoder, we use a resnet-50 backbone followed by projection heads that output pointwise, lower and upper quantile predictions. Each projection head consists of a convolution layer followed by a Leaky-Relu activation and a global average pooling layer. The input to each projection head is the output of the backbone network - a feature map of size 512 4 4 and the output dimension is the number of style dimensions - in the case of the pretrained FFHQ styleGAN2 used in our experiments, this value is 9088. For the generator, we use a FFHQ pretrained styleGAN2 trained to output faces of resolution 1024 1024 obtained from the official implementation. No discriminator is used during training.
Neural Information Processing Systems
Apr-25-2026, 04:58:01 GMT