Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Open in new window