Crammer, Koby
Online Passive-Aggressive Algorithms
Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss.
Online Passive-Aggressive Algorithms
Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram
We present a unified view for online classification, regression, and uniclass problems.This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. Theend result is new algorithms and accompanying loss bounds for the hinge-loss.
Online Classification on a Budget
Crammer, Koby, Kandola, Jaz, Singer, Yoram
Online algorithms for classification often require vast amounts of memory andcomputation time when employed in conjunction with kernel functions. In this paper we describe and analyze a simple approach for an on-the-fly reduction of the number of past examples used for prediction. Experiments performed with real datasets show that using the proposed algorithmic approach with a single epoch is competitive with the support vectormachine (SVM) although the latter, being a batch algorithm, accesses each training example multiple times.
Margin Analysis of the LVQ Algorithm
Crammer, Koby, Gilad-bachrach, Ran, Navot, Amir, Tishby, Naftali
Prototypes based algorithms are commonly used to reduce the computational complexityof Nearest-Neighbour (NN) classifiers. In this paper we discuss theoretical and algorithmical aspects of such algorithms. On the theory side, we present margin based generalization bounds that suggest thatthese kinds of classifiers can be more accurate then the 1-NN rule. Furthermore, we derived a training algorithm that selects a good set of prototypes using large margin principles. We also show that the 20 years old Learning Vector Quantization (LVQ) algorithm emerges naturally fromour framework.
Kernel Design Using Boosting
Crammer, Koby, Keshet, Joseph, Singer, Yoram
The focus of the paper is the problem of learning kernel operators from empirical data. We cast the kernel design problem as the construction of an accurate kernel from simple (and less accurate) base kernels. We use the boosting paradigm to perform the kernel construction process. To do so, we modify the booster so as to accommodate kernel operators. We also devise an efficient weak-learner for simple kernels that is based on generalized eigen vector decomposition. We demonstrate the effectiveness of our approach on synthetic data and on the USPS dataset. On the USPS dataset, the performance of the Perceptron algorithm with learned kernels is systematically better than a fixed RBF kernel.
Margin Analysis of the LVQ Algorithm
Crammer, Koby, Gilad-bachrach, Ran, Navot, Amir, Tishby, Naftali
Prototypes based algorithms are commonly used to reduce the computational complexity of Nearest-Neighbour (NN) classifiers. In this paper we discuss theoretical and algorithmical aspects of such algorithms. On the theory side, we present margin based generalization bounds that suggest that these kinds of classifiers can be more accurate then the 1-NN rule. Furthermore, we derived a training algorithm that selects a good set of prototypes using large margin principles. We also show that the 20 years old Learning Vector Quantization (LVQ) algorithm emerges naturally from our framework.
Kernel Design Using Boosting
Crammer, Koby, Keshet, Joseph, Singer, Yoram
The focus of the paper is the problem of learning kernel operators from empirical data. We cast the kernel design problem as the construction of an accurate kernel from simple (and less accurate) base kernels. We use the boosting paradigm to perform the kernel construction process. To do so, we modify the booster so as to accommodate kernel operators. We also devise an efficient weak-learner for simple kernels that is based on generalized eigen vector decomposition. We demonstrate the effectiveness ofour approach on synthetic data and on the USPS dataset. On the USPS dataset, the performance of the Perceptron algorithm with learned kernels is systematically better than a fixed RBF kernel.
Pranking with Ranking
Crammer, Koby, Singer, Yoram
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering.
Pranking with Ranking
Crammer, Koby, Singer, Yoram
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyzeits performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms onlinealgorithms for regression and classification applied to ranking. 1 Introduction The ranking problem we discuss in this paper shares common properties with both classification and regression problems. As in classification problems the goal is to assign one of k possible labels to a new instance. Similar to regression problems, the set of k labels is structured as there is a total order relation between the labels. We refer to the labels as ranks and without loss of generality assume that the ranks constitute the set {I, 2, .. .
Improved Output Coding for Classification Using Continuous Relaxation
Crammer, Koby, Singer, Yoram
Output coding is a general method for solving multiclass problems by reducing them to multiple binary classification problems. Previous research onoutput coding has employed, almost solely, predefined discrete codes. We describe an algorithm that improves the performance of output codes by relaxing them to continuous codes. The relaxation procedure is cast as an optimization problem and is reminiscent of the quadratic program for support vector machines. We describe experiments with the proposed algorithm, comparing it to standard discrete output codes. The experimental results indicate that continuous relaxations of output codes often improve the generalization performance, especially for short codes.