Generative Powers of Ten
Wang, Xiaojuan, Kontkanen, Janne, Curless, Brian, Seitz, Steve, Kemelmacher, Ira, Mildenhall, Ben, Srinivasan, Pratul, Verbin, Dor, Holynski, Aleksander
–arXiv.org Artificial Intelligence
We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.
arXiv.org Artificial Intelligence
Dec-4-2023
- Country:
- North America > United States (0.16)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.68)
- Media > Film (0.68)
- Technology: