Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Open in new window