LocallyHierarchicalAuto-RegressiveModelingfor ImageGeneration SupplementaryDocument

Feb-9-2026, 13:25:14 GMT–Neural Information Processing Systems

At the first epoch, learning rate is warmed up gradually fromlrinit = 1 10 5 to lrpeak. Figure A and B demonstrate the performances of the baseline and rejection sampling by varying hyperparameterssuchastop-k,softmaxtemperature,andacceptanceratio.Forthebaselinesampling in ImageNet, the hyperparameter setting withk = 2048 and temperaturet = 0.95 achieves the best FID performance in the small and medium models and the second-best performance in the large model. Figure C: Examples of reconstructed images using HQ-VAE with the learnable down-and upsampling layers. B.3 PredictionHeadTransformer(PHT) Wepropose locally hierarchical decoding inPHT contrary tothestandard sequential approach by assuming the conditional independence among bottom codes given a top code. We use pixel-shuffle and -unshuffle for resizing operations as illustrated in (a) while recursively quantizing hierarchical feature maps to acquire three-levelcodes--top,middle,andbottom.

artificial intelligence, hq-transformer, machine learning, (18 more...)

Neural Information Processing Systems

Feb-9-2026, 13:25:14 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Duplicate Docs Excel Report

Title
67d60c2694f4fecd18fa04d1fa8c0a5c-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found