Designing Random Graph Models Using Variational Autoencoders With Applications to Chemical Design
Samanta, Bidisha, De, Abir, Ganguly, Niloy, Gomez-Rodriguez, Manuel
From left to right, given a graph G with a set of node features F and edge weights Y, the encoder aggregates information from a different number of hops j K away for each nodev G into an embedding vectorc v(j). To do so, it uses a feedforward network to propagate information between different search depths, which is parametrized by a set of weight matrices W j . This embedding vectors are then fed into a differentiable functionφ enc, which sets the parameters,µ k andσ k, of several multidimensional Gaussian distributionsq φ, from where the latent representation of each node in the input graph are sampled from. Variational autoencoders are characterized by a probabilistic generative modelp θ(x z) of the observed variablesx R N given the latent variablesz R M, a prior distribution over the latent variablesp(z) and an approximate probabilistic inference modelq φ (z x). In this characterization,p θ and q φ are arbitrary distributions parametrized by two (deep) neural networksθ and φ and one can think of the generative model as a probabilistic decoder, which decodes latent variables into observed variables, and the inference model as a probabilistic encoder, which encodes observed variables into latent variables. Ideally, if we use the maximum likelihood principle to train a variational autoencoder, we should optimize the marginal log-likelihood of the observed data, i.e., E D [log p θ(x)], wherep D is the data distribution. Unfortunately, computing logp θ(x) requires marginalization with respect to the the latent variablez, which is typically intractable. Therefore, one resorts to maximizing a variational lower bound or evidence lower bound (ELBO) of the log-likelihood the observed data, i.e., max θ max φ E D [ KL(q φ (z x) p(z)) E q φ (z x)log p θ(x z)] . Finally, note that the quality of this variational lower bound (and thus the quality of the resulting V AE) depends on the expressive ability of the approximate inference modelq φ (z x), which is typically assumed to be a normal distribution whose mean and variance are parametrized by a (deep) neural networkφ with the observed datax as an input.
Feb-14-2018