Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models