CLIP in Mirror: Disentangling text from visual images through reflection

Neural Information Processing Systems 

"egg" due to the visual object of eggs.