On the Limitations of Vision-Language Models in Understanding Image Transforms

Open in new window