Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect