Country
Data-Driven Online to Batch Conversions
Online learning algorithms are typically fast, memory efficient, and simple to implement. However, many common learning problems fit more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing online algorithm. We first give a unified overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions. Our conversions find hypotheses with a small risk by explicitly minimizing datadependent generalization bounds. We experimentally demonstrate the usefulness of our approach and in particular show that the data-driven conversions consistently outperform the data-independent conversions.
The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
Dekel, Ofer, Shalev-shwartz, Shai, Singer, Yoram
The Perceptron algorithm, despite its simplicity, often performs well on online classification tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difficulty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithm for kernel-based online learning on a fixed memory budget. To our knowledge, this is the first online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.
Fast Krylov Methods for N-Body Learning
Freitas, Nando D., Wang, Yang, Mahdaviani, Maryam, Lang, Dustin
This paper addresses the issue of numerical computation in machine learning domains based on similarity metrics, such as kernel methods, spectral techniques and Gaussian processes. It presents a general solution strategy based on Krylov subspace iteration and fast N-body learning methods. The experiments show significant gains in computation and storage on datasets arising in image segmentation, object detection and dimensionality reduction. The paper also presents theoretical bounds on the stability of these methods.
Norepinephrine and Neural Interrupts
Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts.
Efficient estimation of hidden state dynamics from spike trains
Danoczy, Marton G., Hahnloser, Richard H. R.
Neurons can have rapidly changing spike train statistics dictated by the underlying network excitability or behavioural state of an animal. To estimate the time course of such state dynamics from single-or multiple neuron recordings, we have developed an algorithm that maximizes the likelihood of observed spike trains by optimizing the state lifetimes and the state-conditional interspike-interval (ISI) distributions. Our nonparametric algorithm is free of time-binning and spike-counting problems and has the computational complexity of a Mixed-state Markov Model operating on a state sequence of length equal to the total number of recorded spikes. As an example, we fit a two-state model to paired recordings of premotor neurons in the sleeping songbird. We find that the two state-conditional ISI functions are highly similar to the ones measured during waking and singing, respectively.
Learning from Data of Variable Quality
Crammer, Koby, Kearns, Michael, Wortman, Jennifer
We initiate the study of learning from multiple sources of limited data, each of which may be corrupted at a different rate. We develop a complete theory of which data sources should be used for two fundamental problems: estimating the bias of a coin, and learning a classifier in the presence of label noise. In both cases, efficient algorithms are provided for computing the optimal subset of data.
Size Regularized Cut for Data Clustering
Chen, Yixin, Zhang, Ya, Ji, Xiang
We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NPcomplete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.
Improved risk tail bounds for on-line algorithms
Cesa-bianchi, Nicolò, Gentile, Claudio
We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incrementally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.