instantid
Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID
Ulusan, Koray, Kiefer, Benjamin
Personalizing Stable Diffusion for professional portrait generation from amateur photos faces challenges in maintaining facial resemblance. This paper evaluates the impact of augmentation strategies on two personalization methods: DreamBooth and InstantID. W e compare classical augmentations (flipping, cropping, color adjustments) with generative augmentation using InstantID's synthetic images to enrich training data. Using SDXL and a new FaceDistance metric based on FaceNet, we quantitatively assess facial similarity. Results show classical augmentations can cause artifacts harming identity retention, while InstantID improves fidelity when balanced with real images to avoid overfitting. A user study with 97 participants confirms high photorealism and preferences for InstantID's polished look versus DreamBooth's identity accuracy. Our findings inform effective augmentation strategies for personalized text-to-image generation.
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Wang, Qixun, Bai, Xu, Wang, Haofan, Qin, Zekui, Chen, Anthony, Li, Huaxia, Tang, Xu, Hu, Yao
There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount.
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Research Report > Promising Solution (0.46)
- Research Report > New Finding (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)