c-bound
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
We thank all the reviewers for their insightful comments, suggestions, and references
We thank all the reviewers for their insightful comments, suggestions, and references. Novelty of tandem loss: it is not new, but we were not aware of the prior work, we thank Reviewer 2 for bringing it up. While most of the computed bounds are non-vacuous, they look to be not that tight. Also a discussion of potential ways to obtain tighter bond values, or whether there is a fundamental limitation. We provide some discussion in Sections 3.2 and 4.4.
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
We thank all the reviewers for their insightful comments, suggestions, and references
We thank all the reviewers for their insightful comments, suggestions, and references. Novelty of tandem loss: it is not new, but we were not aware of the prior work, we thank Reviewer 2 for bringing it up. While most of the computed bounds are non-vacuous, they look to be not that tight. Also a discussion of potential ways to obtain tighter bond values, or whether there is a fundamental limitation. We provide some discussion in Sections 3.2 and 4.4.
Self-Bounding Majority Vote Learning Algorithms by the Direct Minimization of a Tight PAC-Bayesian C-Bound
Viallard, Paul, Germain, Pascal, Habrard, Amaury, Morvant, Emilie
In machine learning, ensemble methods [10] aim to combine hypotheses to make predictive models more robust and accurate. A weighted majority vote learning procedure is an ensemble method for classification where each voter/hypothesis is assigned a weight (i.e., its influence in the final voting). Among the most famous majority vote methods, we can cite Boosting [13], Bagging [5], or Random Forest [6]. Interestingly, most of the kernel-based classifiers, like Support Vector Machines [3, 7], can be seen as majority vote of kernel functions. Understanding when and why weighted majority votes perform better than a single hypothesis is challenging. To study the generalization abilities of such majority votes, the PAC-Bayesian framework [34, 25] offers powerful tools to obtain Probably Approximately Correct (PAC) generalization bounds. Motivated by the fact that PAC-Bayesian analyses can lead to tight bounds (see e.g., [28]), developing algorithms to minimize such bounds is an important direction (e.g., [14, 11, 15, 24]). We focus on a class of PAC-Bayesian algorithms minimizing an upper bound on the majority vote's risk called the C-Bound
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
On PAC-Bayesian Bounds for Random Forests
Lorenzen, Stephan Sloth, Igel, Christian, Seldin, Yevgeny
Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments.
- Europe > France (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Germain, Pascal, Lacasse, Alexandre, Laviolette, François, Marchand, Mario, Roy, Jean-Francis
We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > Quebec (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
On the Generalization of the C-Bound to Structured Output Ensemble Methods
Laviolette, François, Morvant, Emilie, Ralaivola, Liva, Roy, Jean-Francis
It is well-known that learning predictive models capable of dealing with outputs that are richer than binary outputs (e.g., multiclass or multilabel) and for which theoretical guarantees exist is still a realm of intensive investigations. From a practical standpoint, a lot of relaxations for learning with complex outputs have been devised. A common approach consists in decomposing the output space into "simpler" spaces so that the learning problem at hand can be reduced to a few easier (i.e., binary) learning tasks. For instance, this is the idea spurred by the Error-Correcting Output Codes (Dietterich & Bakiri, 1995) that makes possible to reduce multiclass or multilabel problems into binary classification tasks,e.g., (Allwein et al., 2001; Mroueh et al., 2012; Read et al., 2011; Tsoumakas & Vlahavas, 2007; Zhang & Schneider, 2012). In our work, we study the problem of complex output prediction by focusing on prediction functions that take the form of a weighted majority vote over a set of complex output classifiers (or voters). Recall that ensemble methods can all be seen as majority vote learning procedures (Dietterich, 2000; Re & Valentini, 2012). Methods such as Bagging (Breiman, 1996), Boosting (Schapire & Singer, 1999) and Random Forests (Breiman, 2001) are representative voting methods. Cortes et al. (2014) have proposed various ensemble methods for the structured output prediction framework. Note also that majority votes are also central to the Bayesian approach (Gelman et al., 2004) with the notion of Bayesian model averaging (Domingos, 2000; Haussler et al., 1994) and most of kernel-based predictors, such as the Support Vector Machines (Boser et al., 1992; Cortes & Vapnik, 1995) may be viewed as weighted majority votes as well: for binary classification, where the predicted class for some input x is computed as the sign of
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)