Interpreting CLIP's Image Representation via Text-Based Decomposition

Open in new window