VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models