Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence