Goto

Collaborating Authors

 hierarchical variational autoencoder


Deep Learning Models for Physical Layer Communications

arXiv.org Artificial Intelligence

The increased availability of data and computing resources has enabled researchers to successfully adopt machine learning (ML) techniques and make significant contributions in several engineering areas. ML and in particular deep learning (DL) algorithms have shown to perform better in tasks where a physical bottom-up description of the phenomenon is lacking and/or is mathematically intractable. Indeed, they take advantage of the observations of natural phenomena to automatically acquire knowledge and learn internal relations. Despite the historical model-based mindset, communications engineering recently started shifting the focus towards top-down data-driven learning models, especially in domains such as channel modeling and physical layer design, where in most of the cases no general optimal strategies are known. In this thesis, we aim at solving some fundamental open challenges in physical layer communications exploiting new DL paradigms. In particular, we mathematically formulate, under ML terms, classic problems such as channel capacity and optimal coding-decoding schemes, for any arbitrary communication medium. We design and develop the architecture, algorithm and code necessary to train the equivalent DL model, and finally, we propose novel solutions to long-standing problems in the field.


CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

arXiv.org Artificial Intelligence

Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modalities, which could significantly enhance modeling and interpretation. We propose a novel probabilistic learning framework that explicitly incorporates conditional independence relationships between multi-modal data as a directed acyclic graph using a generalized hierarchical variational autoencoder. We demonstrate the versatility of our framework across various applications pertinent to single-cell multi-omics data integration. These include the isolation of common and distinct information from different modalities, modality-specific differential analysis, and integrated cell clustering. We anticipate that the proposed framework can facilitate the construction of highly flexible graphical models that can capture the complexities of biological hypotheses and unravel the connections between different biological data types, such as different modalities of paired single-cell multi-omics data. The implementation of the proposed framework can be found in the repository https://github.com/kuijjerlab/CAVACHON.


Attention Based Molecule Generation via Hierarchical Variational Autoencoder

arXiv.org Artificial Intelligence

Molecule generation is a task made very difficult by the complex ways in which we represent molecules computationally. A common technique used in molecular generative modeling is to use SMILES strings with recurrent neural networks built into variational autoencoders - but these suffer from a myriad of issues: vanishing gradients, long-range forgetting, and invalid molecules. In this work, we show that by combining recurrent neural networks with convolutional networks in a hierarchical manner, we are able to both extract autoregressive information from SMILES strings while maintaining signal and long-range dependencies. This allows for generations with very high validity rates on the order of 95% when reconstructing known molecules. We also observe an average Tanimoto similarity of .6 between test set and reconstructed molecules, which suggests our method is able to map between SMILES strings and their learned representations in a more effective way than prior works using similar methods.


Diverse super-resolution with pretrained deep hiererarchical VAEs

arXiv.org Artificial Intelligence

We investigate the problem of producing diverse solutions to an image super-resolution problem. From a probabilistic perspective, this can be done by sampling from the posterior distribution of an inverse problem, which requires the definition of a prior distribution on the high-resolution images. In this work, we propose to use a pretrained hierarchical variational autoencoder (HVAE) as a prior. We train a lightweight stochastic encoder to encode low-resolution images in the latent space of a pretrained HVAE. At inference, we combine the low-resolution encoder and the pretrained generative model to super-resolve an image. We demonstrate on the task of face super-resolution that our method provides an advantageous trade-off between the computational efficiency of conditional normalizing flows techniques and the sample quality of diffusion based methods.


Inverse problem regularization with hierarchical variational autoencoders

arXiv.org Artificial Intelligence

In this paper, we propose to regularize ill-posed inverse problems using a deep hierarchical variational autoencoder (HVAE) as an image prior. The proposed method synthesizes the advantages of i) denoiser-based Plug \& Play approaches and ii) generative model based approaches to inverse problems. First, we exploit VAE properties to design an efficient algorithm that benefits from convergence guarantees of Plug-and-Play (PnP) methods. Second, our approach is not restricted to specialized datasets and the proposed PnP-HVAE model is able to solve image restoration problems on natural images of any size. Our experiments show that the proposed PnP-HVAE method is competitive with both SOTA denoiser-based PnP approaches, and other SOTA restoration methods based on generative models.


Understanding Diffusion Models: A Unified Perspective

arXiv.org Artificial Intelligence

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.


Top-down inference in an early visual cortex inspired hierarchical Variational Autoencoder

arXiv.org Machine Learning

Interpreting computations in the visual cortex as learning and inference in a generative model of the environment has received wide support both in neuroscience and cognitive science. However, hierarchical computations, a hallmark of visual cortical processing, has remained impervious for generative models because of a lack of adequate tools to address it. Here we capitalize on advances in Variational Autoencoders (VAEs) to investigate the early visual cortex with sparse coding hierarchical VAEs trained on natural images. We design alternative architectures that vary both in terms of the generative and the recognition components of the two latent-layer VAE. We show that representations similar to the one found in the primary and secondary visual cortices naturally emerge under mild inductive biases. Importantly, a nonlinear representation for texture-like patterns is a stable property of the high-level latent space resistant to the specific architecture of the VAE, reminiscent of the secondary visual cortex. We show that a neuroscience-inspired choice of the recognition model, which features a top-down processing component is critical for two signatures of computations with generative models: learning higher order moments of the posterior beyond the mean and image inpainting. Patterns in higher order response statistics provide inspirations for neuroscience to interpret response correlations and for machine learning to evaluate the learned representations through more detailed characterization of the posterior.