Industry
Towards a learning-theoretic analysis of spike-timing dependent plasticity
Balduzzi, David, Besserve, Michel
This paper suggests a learning-theoretic perspective on how synaptic plasticity benefits global brain functioning. We introduce a model, the selectron, that (i) arises as the fast time constant limit of leaky integrate-and-fire neurons equipped with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical analysis. We show that the selectron encodes reward estimates into spikes and that an error bound on spikes is controlled by a spiking margin and the sum of synaptic weights. Moreover, the efficacy of spikes (their usefulness to other reward maximizing selectrons) also depends on total synaptic strength. Finally, based on our analysis, we propose a regularized version of STDP, and show the regularization improves the robustness of neuronal learning when faced with multiple stimuli.
Near-optimal Differentially Private Principal Components
Chaudhuri, Kamalika, Sarwate, Anand, Sinha, Kaushik
Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there this a large performance gap between the existing methods and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling.
Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction
Lena, Pietro D., Nagata, Ken, Baldi, Pierre F.
Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NN^k_{ij}, where i and j index the spatial coordinates of the contact map and k indexes ''time''. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components.
Confusion-Based Online Learning and a Passive-Aggressive Scheme
This paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and more precisely to matrix martingales. We do establish generalization bounds for online learning algorithm and show how the theoretical study motivate the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically, thus allowing the user from having to recours to any optimization package to implement it.
Interpreting prediction markets: a stochastic approach
Frongillo, Rafael M., Penna, Nicholรกs Della, Reid, Mark D.
We strengthen recent connections between prediction markets and learning byshowing that a natural class of market makers can be understood as performing stochastic mirror descent when trader demands are sequentially drawnfrom a fixed distribution. This provides new insights into how market prices (and price paths) may be interpreted as a summary of the market's belief distribution by relating them to the optimization problem being solved. In particular, we show that under certain conditions the stationary pointof the stochastic process of prices generated by the market is equal to the market's Walrasian equilibrium of classic market analysis. Together, these results suggest how traditional market making mechanisms might be replaced with general purpose learning algorithms while still retaining guaranteesabout their behaviour.
Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
Rai, Piyush, Kumar, Abhishek, Daume, Hal
Multiple-output regression models require estimating multiple functions, one for each output. To improve parameter estimation in such models, methods based on structural regularization of the model parameters are usually needed. In this paper, we present a multiple-output regression model that leverages the covariance structure of the functions (i.e., how the multiple functions are related with each other) as well as the conditional covariance structure of the outputs. This is in contrast with existing methods that usually take into account only one of these structures. More importantly, unlike most of the other existing methods, none of these structures need be known a priori in our model, and are learned from the data. Several previously proposed structural regularization based multiple-output regression models turn out to be special cases of our model. Moreover, in addition to being a rich model for multiple-output regression, our model can also be used in estimating the graphical model structure of a set of variables (multivariate outputs) conditioned on another set of variables (inputs). Experimental results on both synthetic and real datasets demonstrate the effectiveness of our method.
Entropy Estimations Using Correlated Symmetric Stable Random Projections
Methods for efficiently estimating Shannon entropy of data streams have important applicationsin learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CCis no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite differenceof two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.
Imitation Learning by Coaching
He, He, Eisner, Jason, Daume, Hal
Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle's ability and the learner's policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to a novel cost-sensitive dynamic feature selection problem, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic features selection and two static feature selection methods.
Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds
Moldovan, Teodor M., Abbeel, Pieter
The expected return is a widely used objective in decision making under uncertainty. Manyalgorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections topreviously proposed objectives for risk-aware planing: minmax, exponential utility,percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efficient algorithm for optimizing theproposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.
Efficient and direct estimation of a neural subunit model for sensory coding
Vintch, Brett, Zaharia, Andrew, Movshon, J, Simoncelli, Eero P.
Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of self-similar linear filters. These filters cannot be found using spike-triggered averaging (STA), which estimates only a single filter. Other methods, like spike-triggered covariance (STC), define a multi-dimensional response subspace, but require substantial amounts of data and do not produce unique estimates of the linear filters. Rather, they provide a linear basis for the subspace in which the filters reside. Here, we define a 'subunit' model as an LN-LN cascade, in which the first linear stage is restricted to a set of shifted ("convolutional") copies of a common filter, and the first nonlinear stage consists of rectifying nonlinearities that are identical for all filter outputs; we refer to these initial LN elements as the 'subunits' of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data. The method performs well for both simulated and real data (from primate V1), and the resulting model outperforms STA and STC in terms of both cross-validated accuracy and efficiency.