UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Open in new window