Goto

Collaborating Authors

 hq-transformer


LocallyHierarchicalAuto-RegressiveModelingfor ImageGeneration SupplementaryDocument

Neural Information Processing Systems

At the first epoch, learning rate is warmed up gradually fromlrinit = 1 10 5 to lrpeak. Figure A and B demonstrate the performances of the baseline and rejection sampling by varying hyperparameterssuchastop-k,softmaxtemperature,andacceptanceratio.Forthebaselinesampling in ImageNet, the hyperparameter setting withk = 2048 and temperaturet = 0.95 achieves the best FID performance in the small and medium models and the second-best performance in the large model. Figure C: Examples of reconstructed images using HQ-VAE with the learnable down-and upsampling layers. B.3 PredictionHeadTransformer(PHT) Wepropose locally hierarchical decoding inPHT contrary tothestandard sequential approach by assuming the conditional independence among bottom codes given a top code. We use pixel-shuffle and -unshuffle for resizing operations as illustrated in (a) while recursively quantizing hierarchical feature maps to acquire three-levelcodes--top,middle,andbottom.