The in-context inductive biases of vision-language models differ across modalities

Open in new window