Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Open in new window