Goto

Collaborating Authors

 id fidelity



PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Neural Information Processing Systems

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (\eg, background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible.


PuLID: Pure and Lightning ID Customization via Contrastive Alignment Zinan Guo Y anze Wu Zhuowei Chen Lang Chen Peng Zhang Qian He ByteDance Inc

Neural Information Processing Systems

Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements ( e.g., background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible.


PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Neural Information Processing Systems

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (\eg, background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible.


PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Li, Zhen, Cao, Mingdeng, Wang, Xintao, Qi, Zhongang, Cheng, Ming-Ming, Shan, Ying

arXiv.org Artificial Intelligence

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/