Universal Captioner: Long-Tail Vision-and-Language Model Training through Content-Style Separation

Open in new window