Unifying Vision-and-Language Tasks via Text Generation

Open in new window