visible state
Investigating the generative dynamics of energy-based neural networks
Tausani, Lorenzo, Testolin, Alberto, Zorzi, Marco
Generative neural networks can produce data samples according to the statistical properties of their training distribution. This feature can be used to test modern computational neuroscience hypotheses suggesting that spontaneous brain activity is partially supported by top-down generative processing. A widely studied class of generative models is that of Restricted Boltzmann Machines (RBMs), which can be used as building blocks for unsupervised deep learning architectures. In this work, we systematically explore the generative dynamics of RBMs, characterizing the number of states visited during top-down sampling and investigating whether the heterogeneity of visited attractors could be increased by starting the generation process from biased hidden states. By considering an RBM trained on a classic dataset of handwritten digits, we show that the capacity to produce diverse data prototypes can be increased by initiating top-down sampling from chimera states, which encode high-level visual features of multiple digits. We also found that the model is not capable of transitioning between all possible digit states within a single generation trajectory, suggesting that the top-down dynamics is heavily constrained by the shape of the energy function.
Generative and discriminative training of Boltzmann machine through Quantum annealing
Srivastava, Siddhartha, Sundararaghavan, Veera
A hybrid quantum-classical method for learning Boltzmann machines (BM) for a generative and discriminative task is presented. Boltzmann machines are undirected graphs with a network of visible and hidden nodes where the former is used as the reading site while the latter is used to manipulate visible states' probability. In Generative BM, the samples of visible data imitate the probability distribution of a given data set. In contrast, the visible sites of discriminative BM are treated as Input/Output (I/O) reading sites where the conditional probability of output state is optimized for a given set of input states. The cost function for learning BM is defined as a weighted sum of Kullback-Leibler (KL) divergence and Negative conditional Log-Likelihood (NCLL), adjusted using a hyperparamter. Here, the KL Divergence is the cost for generative learning, and NCLL is the cost for discriminative learning. A Stochastic Newton-Raphson optimization scheme is presented. The gradients and the Hessians are approximated using direct samples of BM obtained through Quantum annealing (QA). Quantum annealers are hardware representing the physics of the Ising model that operates on low but finite temperature. This temperature affects the probability distribution of the BM; however, its value is unknown. Previous efforts have focused on estimating this unknown temperature through regression of theoretical Boltzmann energies of sampled states with the probability of states sampled by the actual hardware. This assumes that the control parameter change does not affect the system temperature, however, this is not usually the case. Instead, an approach that works on the probability distribution of samples, instead of the energies, is proposed to estimate the optimal parameter set. This ensures that the optimal set can be obtained from a single run.
Discovering Sparse Interpretable Dynamics from Partial Observations
Lu, Peter Y., Ariรฑo, Joan, Soljaฤiฤ, Marin
Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. We propose a machine learning framework for discovering these governing equations using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. Our tests show that this method can successfully reconstruct the full system state and identify the underlying dynamics for a variety of ODE and PDE systems.
Convolutional Bipartite Attractor Networks
Iuzzolino, Michael, Singer, Yoram, Mozer, Michael C.
In human perception and cognition, the fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network---a recurrent neural network that performs constraint satisfaction, imputation of missing features, and clean up of noisy data via energy minimization dynamics. We revisit attractor nets in light of modern deep learning methods, and propose a convolutional bipartite architecture with a novel training loss, activation function, and connectivity constraints. We tackle problems much larger than have been previously explored with attractor nets and demonstrate their potential for image denoising, completion, and super-resolution. We argue that this architecture is better motivated than ever-deeper feedforward models and is a viable alternative to more costly sampling-based methods on a range of supervised and unsupervised tasks.
From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets
Savarese, Pedro H. P. (Toyota Technical Institute at Chicago) | Kakodkar, Mayank (Purdue University, West Lafayette, IN) | Ribeiro, Bruno (Purdue University, West Lafayette, IN)
We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.
From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets
Savarese, Pedro H. P., Kakodkar, Mayank, Ribeiro, Bruno
We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.
(Yet) Another Theoretical Model of Thinking
This paper presents a theoretical, idealized model of the thinking process with the following characteristics: 1) the model can produce complex thought sequences and can be generalized to new inputs, 2) it can receive and maintain input information indefinitely for the generation of thoughts and later use, and 3) it supports learning while executing. The crux of the model lies within the concept of internal consistency, or the generated thoughts should always be consistent with the inputs from which they are created. Its merit, apart from the capability to generate new creative thoughts from an internal mechanism, depends on the potential to help training to generalize better. This is consequently enabled by separating input information into several parts to be handled by different processing components with a focus mechanism to fetch information for each. This modularized view with the focus binds the model with the computationally capable Turing machines. And as a final remark, this paper constructively shows that the computational complexity of the model is at least, if not surpass, that of a universal Turing machine.
Modeling Human Motion Using Binary Latent Variables
Taylor, Graham W., Hinton, Geoffrey E., Roweis, Sam T.
We propose a nonlinear generative model for human motion data that uses an undirected model with binary latent variables and real-valued "visible" variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes online inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing online filling in of data lost during motion capture.
Modeling Human Motion Using Binary Latent Variables
Taylor, Graham W., Hinton, Geoffrey E., Roweis, Sam T.
We propose a nonlinear generative model for human motion data that uses an undirected model with binary latent variables and real-valued "visible" variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes online inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing online filling in of data lost during motion capture.