Bottou, Léon
Large Scale Online Learning
Bottou, Léon, Cun, Yann L.
Geometric Clustering Using the Information Bottleneck Method
Still, Susanne, Bialek, William, Bottou, Léon
We argue that K-means and deterministic annealing algorithms for geometric clusteringcan be derived from the more general Information Bottleneck approach.If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of "smooth" initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification,we argue that this approach can be more efficient and robust than classic algorithms.
Vicinal Risk Minimization
Chapelle, Olivier, Weston, Jason, Bottou, Léon, Vapnik, Vladimir
The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principle such as Support Vector Machines or Statistical Regularization. We explain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms for solving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.
Vicinal Risk Minimization
Chapelle, Olivier, Weston, Jason, Bottou, Léon, Vapnik, Vladimir
The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principlesuch as Support Vector Machines or Statistical Regularization. Weexplain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms forsolving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.
Convergence Properties of the K-Means Algorithms
Bottou, Léon, Bengio, Yoshua
K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.
Convergence Properties of the K-Means Algorithms
Bottou, Léon, Bengio, Yoshua
K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.
Convergence Properties of the K-Means Algorithms
Bottou, Léon, Bengio, Yoshua
K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.