Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces