Goto

Collaborating Authors

 Sompolinsky, Haim


A new role for circuit expansion for learning in neural networks

arXiv.org Machine Learning

Many sensory pathways in the brain rely on sparsely active populations of neurons downstream from the input stimuli. The biological reason for the occurrence of expanded structure in the brain is unclear, but may be because expansion can increase the expressive power of a neural network. In this work, we show that expanding a neural network can improve its generalization performance even in cases in which the expanded structure is pruned after the learning period. To study this setting we use a teacher-student framework where a perceptron teacher network generates labels which are corrupted with small amounts of noise. We then train a student network that is structurally matched to the teacher and can achieve optimal accuracy if given the teacher's synaptic weights. We find that sparse expansion of the input of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when these expansions are pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean field limit. We show by solving the mean field equations that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. The improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.


Short-term memory in neuronal networks through dynamical compressed sensing

Neural Information Processing Systems

Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that {\it exceeds} the number of neurons.


Classification and Geometry of General Perceptual Manifolds

arXiv.org Machine Learning

Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry revealing a remarkable relation to the mathematics of conic decomposition. Novel geometrical measures of manifold radius and manifold dimension are introduced which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, including L2 ellipsoids prototypical of strictly convex manifolds, L1 balls representing polytopes consisting of finite sample points, and orientation manifolds which arise from neurons tuned to respond to a continuous angle variable, such as object orientation. The effects of label sparsity on the classification capacity of manifolds are elucidated, revealing a scaling relation between label sparsity and manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from neuronal responses to object stimuli, as well as to artificial deep networks trained for object recognition tasks.


Learning Data Manifolds with a Cutting Plane Method

arXiv.org Machine Learning

We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic semi-infinite programming problem to find the maximum margin solution. We provide a proof of convergence as well as a polynomial bound on the number of iterations required for a desired tolerance in the objective function. The efficiency and performance of M_{CP} are demonstrated in high-dimensional simulations and on image manifolds generated from the ImageNet dataset. Our results indicate that M_{CP} is able to rapidly learn good classifiers and shows superior generalization performance compared with conventional maximum margin methods using data augmentation methods.


Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

arXiv.org Machine Learning

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. To evaluate the impact of both input and output noise, we determine the robustness of single-neuron stimulus selective responses, as well as the robustness of attractor states of networks of neurons performing memory tasks. We find that robustness to output noise requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. We evaluate the conditions required for this regime to exist and determine the properties of networks operating within it. A plausible synaptic plasticity rule for learning that balances weight configurations is presented. Our theory predicts an optimal ratio of the number of excitatory and inhibitory synapses for maximizing the encoding capacity of balanced networks for a given statistics of afferent activations. Previous work has shown that balanced networks amplify spatio-temporal variability and account for observed asynchronous irregular states. Here we present a novel type of balanced network that amplifies small changes in the impinging signals, and emerges automatically from learning to perform neuronal and network functions robustly.


Optimal Architectures in a Solvable Model of Deep Networks

Neural Information Processing Systems

Deep neural networks have received a considerable attention due to the success of their training for real world machine learning applications. They are also of great interest to the understanding of sensory processing in cortical sensory hierarchies. The purpose of this work is to advance our theoretical understanding of the computational benefits of these architectures. Using a simple model of clustered noisy inputs and a simple learning rule, we provide analytically derived recursion relations describing the propagation of the signals along the deep network. By analysis of these equations, and defining performance measures, we show that these model networks have optimal depths. We further explore the dependence of the optimal architecture on the system parameters.


Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics

Neural Information Processing Systems

How are the spatial patterns of spontaneous and evoked population responses related? We study the impact of connectivity on the spatial pattern of fluctuations in the input-generated response of a neural network, by comparing the distribution of evoked and intrinsically generated activity across the different units. We develop a complementary approach to principal component analysis in which separate high-variance directions are typically derived for each input condition. We analyze subspace angles to compute the difference between the shapes of trajectories corresponding to different network states, and the orientation of the low-dimensional subspaces that driven trajectories occupy within the full space of neuronal activity. In addition to revealing how the spatiotemporal structure of spontaneous activity affects input-evoked responses, these methods can be used to infer input selectivity induced by network dynamics from experimentally accessible measures of spontaneous activity (e.g. from voltage- or calcium-sensitive optical imaging experiments). We conclude that the absence of a detailed spatial map of afferent inputs and cortical connectivity does not limit our ability to design spatially extended stimuli that evoke strong responses.


Short-term memory in neuronal networks through dynamical compressed sensing

Neural Information Processing Systems

Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that {\it exceeds} the number of neurons. This enhanced capacity is achieved by a class of ``orthogonal recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance."


Theory of spike timing based neural classifiers

arXiv.org Machine Learning

We study the computational capacity of a model neuron, the Tempotron, which classifies sequences of spikes by linear-threshold operations. We use statistical mechanics and extreme value theory to derive the capacity of the system in random classification tasks. In contrast to its static analog, the Perceptron, the Tempotron's solutions space consists of a large number of small clusters of weight vectors. The capacity of the system per synapse is finite in the large size limit and weakly diverges with the stimulus duration relative to the membrane and synaptic time constants.


Correlation Codes in Neuronal Populations

Neural Information Processing Systems

Population codes often rely on the tuning of the mean responses to the stimulus parameters. However, this information can be greatly suppressed by long range correlations. Here we study the efficiency of coding information in the second order statistics of the population responses. We show that the Fisher Information of this system grows linearly with the size of the system. We propose a bilinear readout model for extracting information from correlation codes, and evaluate its performance in discrimination and estimation tasks. It is shown that the main source of information in this system is the stimulus dependence of the variances of the single neuron responses.