Generative Powers of Ten

Wang, Xiaojuan, Kontkanen, Janne, Curless, Brian, Seitz, Steve, Kemelmacher, Ira, Mildenhall, Ben, Srinivasan, Pratul, Verbin, Dor, Holynski, Aleksander

Dec-4-2023–arXiv.org Artificial Intelligence

We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.

diffusion model, text prompt, zoom level, (14 more...)

arXiv.org Artificial Intelligence

Dec-4-2023

arXiv.org PDF

Add feedback

Country:
- Pacific Ocean (0.04)
- North America > United States
  - Hawaii (0.06)
- Asia > Japan
  - Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)

Genre:
- Research Report (0.64)

Industry:
- Media > Film (0.68)
- Leisure & Entertainment (0.68)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.94)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning > Neural Networks (0.70)
    - Natural Language > Large Language Model (0.47)