Baldi, Pierre
A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation
Baldi, Pierre, Sadowski, Peter
In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. We estimate the learning channel capacity associated with several algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost, even in recurrent networks. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far.
Learning Activation Functions to Improve Deep Neural Networks
Agostinelli, Forest, Hoffman, Matthew, Sadowski, Peter, Baldi, Pierre
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.
Searching for Higgs Boson Decay Modes with Deep Learning
Sadowski, Peter J., Whiteson, Daniel, Baldi, Pierre
Particle colliders enable us to probe the fundamental nature of matter by observing exotic particles produced by high-energy collisions. Because the experimental measurements from these collisions are necessarily incomplete and imprecise, machine learning algorithms play a major role in the analysis of experimental data. The high-energy physics community typically relies on standardized machine learning software packages for this analysis, and devotes substantial effort towards improving statistical power by hand crafting high-level features derived from the raw collider measurements. In this paper, we train artificial neural networks to detect the decay of the Higgs boson to tau leptons on a dataset of 82 million simulated collision events. We demonstrate that deep neural network architectures are particularly well-suited for this task with the ability to automatically discover high-level features from the data and increase discovery significance.
Understanding Dropout
Baldi, Pierre, Sadowski, Peter J.
Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out'' neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characterized by three recursive equations, including the approximation of expectations by normalized weighted geometric means. We provide estimates and bounds for these approximations and corroborate the results with simulations. We also show in simple cases how dropout performs stochastic gradient descent on a regularized error function."
Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Pollastri, Gianluca, Baldi, Pierre, Vullo, Alessandro, Frasconi, Paolo
We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse-or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance.
Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Pollastri, Gianluca, Baldi, Pierre, Vullo, Alessandro, Frasconi, Paolo
We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse-or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance.
Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Pollastri, Gianluca, Baldi, Pierre, Vullo, Alessandro, Frasconi, Paolo
We develop and test new machine learning methods for the prediction oftopological representations of protein structures in the form of coarse-or fine-grained contact or distance maps that are translation androtation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topologydirectly in the fine-grained case and, in the coarsegrained case,indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achievestate-of-the-art performance.
Universal Approximation and Learning of Trajectories Using Oscillators
Baldi, Pierre, Hornik, Kurt
Natural and artificial neural circuits must be capable of traversing specificstate space trajectories. A natural approach to this problem is to learn the relevant trajectories from examples. Unfortunately, gradientdescent learning of complex trajectories in amorphous networks is unsuccessful. We suggest a possible approach wheretrajectories are realized by combining simple oscillators, in various modular ways. We contrast two regimes of fast and slow oscillations.
Universal Approximation and Learning of Trajectories Using Oscillators
Baldi, Pierre, Hornik, Kurt
Natural and artificial neural circuits must be capable of traversing specific state space trajectories. A natural approach to this problem is to learn the relevant trajectories from examples. Unfortunately, gradient descent learning of complex trajectories in amorphous networks is unsuccessful. We suggest a possible approach where trajectories are realized by combining simple oscillators, in various modular ways. We contrast two regimes of fast and slow oscillations. In all cases, we show that banks of oscillators with bounded frequencies have universal approximation properties. Open questions are also discussed briefly.
Inferring Ground Truth from Subjective Labelling of Venus Images
Smyth, Padhraic, Fayyad, Usama M., Burl, Michael C., Perona, Pietro, Baldi, Pierre
In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability and bias of expert labellers is a nontrivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite significant interms of quantifying both human and algorithm detection performance.