Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections
Bursztyn, Victor S., Healey, Jennifer, Vinay, Vishwa
–arXiv.org Artificial Intelligence
Right: A mood-board created by a professional designer using Gaudí for the given project briefing: "You're designing a new ecofriendly, highend coffee brand that is notorious for its floral flavors." All images are from the BAM dataset [6]. Gaudí was developed to help designers search for inspirational images using natural language. In the early stages of the design process, designers will typically create thematic image collections called "mood-boards" (example shown in Figure 1) in order to elicit and clarify a client's preferred creative direction. Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaudí transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project's briefing, following a hypothesized mood. Previous computational approaches to this process tend to oversimplify the decision space, seeking to define it by hard coded qualities like dominant color, saturation and brightness [3, 2]. Recent advances in realistic language modeling (e.g., with GPT-3 [1]) and cross-modal image retrieval (e.g., with CLIP [5]) now allow us to represent image collections in a much richer semantic space, acknowledging richer variation in the stories designers tell when presenting a creative direction to a client.
arXiv.org Artificial Intelligence
Dec-5-2021