We present a theoretical analysis of Gaussian-binary restricted Boltzmann machines (GRBMs) from the perspective of density models. The key aspect of this analysis is to show that GRBMs can be formulated as a constrained mixture of Gaussians, which gives a much better insight into the model's capabilities and limitations. We show that GRBMs are capable of learning meaningful features both in a two-dimensional blind source separation task and in modeling natural images. Further, we show that reported difficulties in training GRBMs are due to the failure of the training algorithm rather than the model itself. Based on our analysis we are able to propose several training recipes, which allowed successful and fast training in our experiments. Finally, we discuss the relationship of GRBMs to several modifications that have been proposed to improve the model.
With the increasing variety of services that e-commerce platforms provide, criteria for evaluating their success become also increasingly multi-targeting. This work introduces a multi-target optimization framework with Bayesian modeling of the target events, called Deep Bayesian Multi-Target Learning (DBMTL). In this framework, target events are modeled as forming a Bayesian network, in which directed links are parameterized by hidden layers, and learned from training samples. The structure of Bayesian network is determined by model selection. We applied the framework to Taobao live-streaming recommendation, to simultaneously optimize (and strike a balance) on targets including click-through rate, user stay time in live room, purchasing behaviors and interactions. Significant improvement has been observed for the proposed method over other MTL frameworks and the non-MTL model. Our practice shows that with an integrated causality structure, we can effectively make the learning of a target benefit from other targets, creating significant synergy effects that improve all targets. The neural network construction guided by DBMTL fits in with the general probabilistic model connecting features and multiple targets, taking weaker assumption than the other methods discussed in this paper. This theoretical generality brings about practical generalization power over various targets distributions, including sparse targets and continuous-value ones.
Scalable Bayesian sampling is playing an important role in modern machine learning, especially in the fast-developed unsupervised-(deep)-learning models. While tremendous progresses have been achieved via scalable Bayesian sampling such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD), the generated samples are typically highly correlated. Moreover, their sample-generation processes are often criticized to be inefficient. In this paper, we propose a novel self-adversarial learning framework that automatically learns a conditional generator to mimic the behavior of a Markov kernel (transition kernel). High-quality samples can be efficiently generated by direct forward passes though a learned generator. Most importantly, the learning process adopts a self-learning paradigm, requiring no information on existing Markov kernels, e.g., knowledge of how to draw samples from them. Specifically, our framework learns to use current samples, either from the generator or pre-provided training data, to update the generator such that the generated samples progressively approach a target distribution, thus it is called self-learning. Experiments on both synthetic and real datasets verify advantages of our framework, outperforming related methods in terms of both sampling efficiency and sample quality.
Bayesian probabilistic models provide a nimble and expressive framework for modeling "small-world" data. In contrast, deep learning offers a more rigid yet much more powerful framework for modeling data of massive size. Edward is a probabilistic programming library that bridges this gap: "black-box" variational inference enables us to fit extremely flexible Bayesian models to large-scale data. Furthermore, these models themselves may take advantage of classic deep-learning architectures of arbitrary complexity. Edward uses TensorFlow for symbolic gradients and data flow graphs.
In this paper we present a framework for using multi-layer perceptron (MLP)networks in nonlinear generative models trained by variational Bayesian learning. The nonlinearity is handled by linearizing it using a Gauss-Hermite quadrature at the hidden neurons. Thisyields an accurate approximation for cases of large posterior variance.The method can be used to derive nonlinear counterparts forlinear algorithms such as factor analysis, independent component/factor analysis and state-space models. This is demonstrated witha nonlinear factor analysis experiment in which even 20 sources can be estimated from a real world speech data set.