Context-aware Synthesis and Placement of Object Instances

Lee, Donghoon, Liu, Sifei, Gu, Jinwei, Liu, Ming-Yu, Yang, Ming-Hsuan, Kautz, Jan

Neural Information Processing Systems 

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem. Solving it requires (a) determining a location to place an object in the scene and (b) determining its appearance at the location. Such an object insertion model can potentially facilitate numerous image editing and scene parsing applications. In this paper, we propose an end-to-end trainable neural network for the task of inserting an object instance mask of a specified class into the semantic label map of an image. Our network consists of two generative modules where one determines where the inserted object mask should be (i.e., location and scale) and the other determines what the object mask shape (and pose) should look like.