DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics