DisCoCLIP: A Distributional Compositional Tensor Network Encoder for Vision-Language Understanding

Open in new window