FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models, Ryan Po

Neural Information Processing Systems 

Recent advances in text-to-image generation have enabled the creation of highquality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes like lighting and dynamics.