FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model