Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Open in new window