Learning Management
On the Generalization Ability of On-Line Learning Algorithms
Cesa-bianchi, Nicolò, Conconi, Alex, Gentile, Claudio
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
On the Generalization Ability of On-Line Learning Algorithms
Cesa-bianchi, Nicolò, Conconi, Alex, Gentile, Claudio
In this paper we show that online algorithms for classification and regression canbe naturally used to obtain hypotheses with good datadependent tailbounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
Online Learning with Kernels
Kivinen, Jyrki, Smola, Alex J., Williamson, Robert C.
We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization.
Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
Khardon, Roni, Roth, Dan, Servedio, Rocco A.
We study online learning in Boolean domains using kernels which capture featureexpansions equivalent to using conjunctions over basic features. Wedemonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability ofthe resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithmover an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. Wealso consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow's behaviorfor learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.
On-line Learning from Finite Training Sets in Nonlinear Networks
Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.
On-line Learning from Finite Training Sets in Nonlinear Networks
Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.
On-line Learning from Finite Training Sets in Nonlinear Networks
Online learning is one of the most common forms of neural network training.We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancingthe theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.
Adaptive On-line Learning in Changing Environments
Murata, Noboru, Müller, Klaus-Robert, Ziehe, Andreas, Amari, Shun-ichi
An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.
Online Learning from Finite Training Sets: An Analytical Case Study
By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance of online learning is essentially as good as that of offline learning.
Adaptive On-line Learning in Changing Environments
Murata, Noboru, Müller, Klaus-Robert, Ziehe, Andreas, Amari, Shun-ichi
An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.