Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

Open in new window