Goto

Collaborating Authors

 pagoda


PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Neural Information Processing Systems

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on $8\times$ downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from $64\times64$ to $512\times512$, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.




PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Neural Information Processing Systems

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a 64\times reduced cost in training its diffusion model on 8\times downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64\times64 to 512\times512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.


Interview with Yuki Mitsufuji: Improving AI image generation

AIHub

Yuki Mitsufuji is a Lead Research Scientist at Sony AI. Yuki and his team presented two papers at the recent Conference on Neural Information Processing Systems (NeurIPS 2024). These works tackle different aspects of image generation and are entitled: GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping and PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher . We caught up with Yuki to find out more about this research. The problem we aimed to solve is called single-shot novel view synthesis, which is where you have one image and want to create another image of the same scene from a different camera angle. There has been a lot of work in this space, but a major challenge remains: when an image angle changes substantially, the image quality degrades significantly.


PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Kim, Dongjun, Lai, Chieh-Hsin, Liao, Wei-Hsiang, Takida, Yuhta, Murata, Naoki, Uesaka, Toshimitsu, Mitsufuji, Yuki, Ermon, Stefano

arXiv.org Machine Learning

To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. Our key insight is that a pre-trained, low-resolution DM can be used to deterministically encode high-resolution data to a structured latent space by solving the PF-ODE forward in time (data-to-noise), starting from an appropriately down-sampled image. Using this frozen encoder in an auto-encoder framework, we train a decoder by progressively growing its resolution. From the nature of progressively growing decoder, PaGoDA avoids re-training teacher/student models when we upsample the student model, making the whole training pipeline much cheaper. In experiments, we used our progressively growing decoder to upsample from the pre-trained model's 64x64 resolution to generate 512x512 samples, achieving 2x faster inference compared to single-step distilled Stable Diffusion like LCM. PaGoDA also achieved state-of-the-art FIDs on ImageNet across all resolutions from 64x64 to 512x512. Additionally, we demonstrated PaGoDA's effectiveness in solving inverse problems and enabling controllable generation.


Computing CQ lower-bounds over OWL 2 through approximation to RSA

Igne, Federico, Germano, Stefano, Horrocks, Ian

arXiv.org Artificial Intelligence

Conjunctive query (CQ) answering over knowledge bases is an important reasoning task. However, with expressive ontology languages such as OWL, query answering is computationally very expensive. The PAGOdA system addresses this issue by using a tractable reasoner to compute lower and upper-bound approximations, falling back to a fully-fledged OWL reasoner only when these bounds don't coincide. The effectiveness of this approach critically depends on the quality of the approximations, and in this paper we explore a technique for computing closer approximations via RSA, an ontology language that subsumes all the OWL 2 profiles while still maintaining tractability. We present a novel approximation of OWL 2 ontologies into RSA, and an algorithm to compute a closer (than PAGOdA) lower bound approximation using the RSA combined approach. We have implemented these algorithms in a prototypical CQ answering system, and we present a preliminary evaluation of our system that shows significant performance improvements w.r.t. PAGOdA.


Coffee with Sasquatch and a Couple of Robots

The New Yorker

In 2011, Don Moyer, a retired graphic designer, inherited a Blue Willow plate from his grandmother. Washington, and draws every day. "I got this plate and I was studying it, and I really kind of liked it," he said. "The design was very busy, like doodling--no place was at rest." At the end, for no particular reason, he added a small pterodactyl.


PAGODA: A Model for

AI Magazine

The system consists of an overall agent architecture and five components within the architecture. The five components are (1) goaldirected learning (GDL), a decisiontheoretic method for selecting learning goals; (2) probabilistic bias evaluation (PBE), a technique for using probabilistic background knowledge to select learning biases for the learning goals; (3) uniquely predictive theories (UPTs) and probability computation using independence (PCI), a probabilistic representation and Bayesian inference method for the agent's theories; (4) a probabilistic learning component, consisting of a heuristic search algorithm and a Bayesian method for evaluating proposed theories; and (5) a decision-theoretic probabilistic planner, which searches through the probability space defined by the agent's current theory to select the best action. PAGODA's initial learning goal is just An autonomous agent must be able to select biases (Mitchell 1980) for new learning tasks as they arise. PBE uses probabilistic background knowledge and a model of the system's expected learning performance to compute the expected value of learning biases for each learning goal. The resulting expected discounted future accuracy is used as the expected value of the bias.


PAGOdA: Pay-As-You-Go Ontology Query Answering Using a Datalog Reasoner

Zhou, Yujiao, Cuenca Grau, Bernardo, Nenov, Yavor, Kaminski, Mark, Horrocks, Ian

Journal of Artificial Intelligence Research

Answering conjunctive queries over ontology-enriched datasets is a core reasoning task for many applications. Query answering is, however, computationally very expensive, which has led to the development of query answering procedures that sacrifice either expressive power of the ontology language, or the completeness of query answers in order to improve scalability. In this paper, we describe a hybrid approach to query answering over OWL 2 ontologies that combines a datalog reasoner with a fully-fledged OWL 2 reasoner in order to provide scalable `pay-as-you-go' performance. The key feature of our approach is that it delegates the bulk of the computation to the datalog reasoner and resorts to expensive OWL 2 reasoning only as necessary to fully answer the query. Furthermore, although our main goal is to efficiently answer queries over OWL 2 ontologies and data, our technical results are very general and our approach is applicable to first-order knowledge representation languages that can be captured by rules allowing for existential quantification and disjunction in the head; our only assumption is the availability of a datalog reasoner and a fully-fledged reasoner for the language of interest, both of which are used as `black boxes'. We have implemented our techniques in the PAGOdA system, which combines the datalog reasoner RDFox and the OWL 2 reasoner HermiT. Our extensive evaluation shows that PAGOdA succeeds in providing scalable pay-as-you-go query answering for a wide range of OWL 2 ontologies, datasets and queries.