negative phase
Mean-Field Assisted Deep Boltzmann Learning with Probabilistic Computers
Chowdhury, Shuvro, Niazi, Shaila, Camsari, Kerem Y.
Despite their appeal as physics-inspired, energy-based and generative nature, general Boltzmann Machines (BM) are considered intractable to train. This belief led to simplified models of BMs with restricted intralayer connections or layer-by-layer training of deep BMs. Recent developments in domain-specific hardware -- specifically probabilistic computers (p-computer) with probabilistic bits (p-bit) -- may change established wisdom on the tractability of deep BMs. In this paper, we show that deep and unrestricted BMs can be trained using p-computers generating hundreds of billions of Markov Chain Monte Carlo (MCMC) samples per second, on sparse networks developed originally for use in D-Wave's annealers. To maximize the efficiency of learning the p-computer, we introduce two families of Mean-Field Theory assisted learning algorithms, or xMFTs (x = Naive and Hierarchical). The xMFTs are used to estimate the averages and correlations during the positive phase of the contrastive divergence (CD) algorithm and our custom-designed p-computer is used to estimate the averages and correlations in the negative phase. A custom Field-Programmable-Gate Array (FPGA) emulation of the p-computer architecture takes up to 45 billion flips per second, allowing the implementation of CD-$n$ where $n$ can be of the order of millions, unlike RBMs where $n$ is typically 1 or 2. Experiments on the full MNIST dataset with the combined algorithm show that the positive phase can be efficiently computed by xMFTs without much degradation when the negative phase is computed by the p-computer. Our algorithm can be used in other scalable Ising machines and its variants can be used to train BMs, previously thought to be intractable.
Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
We derive an optimal learning rule in the sense of mutual information maximization for a spiking neuron model. Under the assumption of small fluctuations of the input, we find a spike-timing dependent plas- ticity (STDP) function which depends on the time course of excitatory postsynaptic potentials (EPSPs) and the autocorrelation function of the postsynaptic neuron. We show that the STDP function has both positive and negative phases. The positive phase is related to the shape of the EPSP while the negative phase is controlled by neuronal refractoriness.
A new harmonium for pattern recognition in survival data
Donker, Hylke C., Groen, Harry J. M.
Background: Survival analysis concerns the study of timeline data where the event of interest may remain unobserved (i.e., censored). Studies commonly record more than one type of event, but conventional survival techniques focus on a single event type. We set out to integrate both multiple independently censored time-to-event variables as well as missing observations. Methods: An energy-based approach is taken with a bi-partite structure between latent and visible states, commonly known as harmoniums (or restricted Boltzmann machines). Results: The present harmonium is shown, both theoretically and experimentally, to capture non-linear patterns between distinct time recordings. We illustrate on real world data that, for a single time-to-event variable, our model is on par with established methods. In addition, we demonstrate that discriminative predictions improve by leveraging an extra time-to-event variable. Conclusions: Multiple time-to-event variables can be successfully captured within the harmonium paradigm.
Cyclical DJIA โ spxbot
Without a real schedule, I'm developing the cyclical analysis tool and I have to say that I was surprised from this chart brewed last Friday. It's not perfect at all, but it gives a different perspective to index forecast, totally separated from the model and artificial intelligence tools. The chart shows the Dow Jones Industrial Average (not the S&P 500, as DJIA has a much longer history) that in entering a cyclical negative phase, which does not means that the index will crash, but that, as in previous negative phases, may encounter a more "meditative" phase for next two to three months. Analysis is conducted on the DJIA historical daily data from May 1896.
Weighted Contrastive Divergence
Merino, Enrique Romero, Castrillejo, Ferran Mazzanti, Pin, Jordi Delgado, Prats, David Buchaca
Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost.
On the Challenges of Physical Implementations of RBMs
Dumoulin, Vincent, Goodfellow, Ian J., Courville, Aaron, Bengio, Yoshua
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are based on the D-Wave Two computer, but the issues we investigate arise in most forms of physical computation. Our findings suggest that designers of new physical computing hardware and algorithms for physical computers should focus their efforts on overcoming the limitations imposed by the topology restrictions of currently existing physical computers.
On the Challenges of Physical Implementations of RBMs
Dumoulin, Vincent (Universitรฉ de Montrรฉal) | Goodfellow, Ian J (Universitรฉ de Montrรฉal) | Courville, Aaron (Universitรฉ de Montrรฉal) | Bengio, Yoshua (Universitรฉ de Montrรฉal)
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the costof sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are based on the D-Wave Two computer, but the issues we investigate arise in most forms of physical computation.Our findings suggest that designers of new physical computing hardware and algorithms for physical computers should focus their efforts on overcoming the limitations imposed by the topology restrictions of currently existing physical computers.
Training Restricted Boltzmann Machine by Perturbation
Ravanbakhsh, Siamak, Greiner, Russell, Frey, Brendan
A new approach to maximum likelihood learning of discrete graphical models and RBM in particular is introduced. Our method, Perturb and Descend (PD) is inspired by two ideas (I) perturb and MAP method for sampling (II) learning by Contrastive Divergence minimization. In contrast to perturb and MAP, PD leverages training data to learn the models that do not allow efficient MAP estimation. During the learning, to produce a sample from the current model, we start from a training data and descend in the energy landscape of the "perturbed model", for a fixed number of steps, or until a local optima is reached. For RBM, this involves linear calculations and thresholding which can be very fast. Furthermore we show that the amount of perturbation is closely related to the temperature parameter and it can regularize the model by producing robust features resulting in sparse hidden layer activation.