Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Open in new window