generalization error bound
Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure
Two-stage recommender systems play a crucial role in efficiently identifying relevant items and personalizing recommendations from a vast array of options. This paper, based on an error decomposition framework, analyzes the generalization error for two-stage recommender systems with a tree structure, which consist of an efficient tree-based retriever and a more precise yet time-consuming ranker. We use the Rademacher complexity to establish the generalization upper bound for various tree-based retrievers using beam search, as well as for different ranker models under a shifted training distribution. Both theoretical insights and practical experiments on real-world datasets indicate that increasing the branches in tree-based retrievers and harmonizing distributions across stages can enhance the generalization performance of two-stage recommender systems.
Lean Formalization of Generalization Error Bound by Rademacher Complexity
Sonoda, Sho, Kasaura, Kazumi, Mizuno, Yuma, Tsukamoto, Kei, Onda, Naoto
Naoto Onda 2, 5 naoto.onda@sinicx.com 1 RIKEN AIP 2 OMRON SINIC X Corporation 3 University College Dublin 4 The University of Tokyo 5 AutoRes Abstract We formalize the generalization error bound using Rademacher complexity in the Lean 4 theorem prover. Generalization error quantifies the gap between a learning machine's performance on given training data versus unseen test data, and Rademacher complexity serves as an estimate of this error based on the complexity of learning machines, or hypothesis class. Unlike traditional methods such as PAC learning and VC dimension, Rademacher complexity is applicable across diverse machine learning scenarios including deep learning and kernel methods. We formalize key concepts and theorems, including the empirical and population Rademacher complexities, and establish generalization error bounds through formal proofs of McDiarmid's inequality, Hoeffding's lemma, and symmetrization arguments. We make our code available on GitHub. 1 1 Introduction Generalization is a central concept in machine learning that describes how well a learning machine, constructed based on training data, can make predictions on unseen test data .
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois (0.04)
- (2 more...)
Generalization Error Bounds on Deep Learning with Markov Datasets
In this paper, we derive upper bounds on generalization errors for deep neural networks with Markov datasets. These bounds are developed based on Koltchinskii and Panchenko's approach for bounding the generalization error of combined classifiers with i.i.d. The development of new symmetrization inequalities in high-dimensional probability for Markov chains is a key element in our extension, where the spectral gap of the infinitesimal generator of the Markov chain plays a key parameter in these inequalities. We also propose a simple method to convert these bounds and other similar ones in traditional deep learning and machine learning to Bayesian counterparts for both i.i.d. and Markov datasets. Extensions to m -order homogeneous Markov chains such as AR and ARMA models and mixtures of several Markov data services are given.
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
We prove generalization error bounds for predicting entries in a partially observed matrix by fitting the observed entries with a low-rank matrix. In justifying the analysis approach we take to obtain the bounds, we present an example of a class of functions of finite pseudodimension such that the sums of functions from this class have unbounded pseudodimension. "Collaborative filtering" refers to the general task of providing users with information on what items they might like, or dislike, based on their preferences so far and how they relate to the preferences of other users. This approach contrasts with a more traditional feature- based approach where predictions are made based on features of the items. For feature-based approaches, we are accustomed to studying prediction methods in terms of probabilistic post-hoc generalization error bounds. Such results provide us a (proba- bilistic) bound on the performance of our predictor on future examples, in terms of its performance on the training data.
A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex opti- mization problems, which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which effi- cient solvers are available. Highly competitive numerical results on both artificial and real-world data sets are reported.
Generalization Error Bounds for Multiclass Sparse Linear Classifiers
Levy, Tomer, Abramovich, Felix
We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
Generalization Error Bounds via $m$th Central Moments of the Information Density
Hellström, Fredrik, Durisi, Giuseppe
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms. Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities, both for the case in which a new hypothesis is randomly generated every time the algorithm is used - as often assumed in the probably approximately correct (PAC)-Bayesian literature - and in the single-draw case, where the hypothesis is extracted only once. For this last scenario, we present a novel bound that is explicit in the central moments of the information density. The bound reveals that the higher the order of the information density moment that can be controlled, the milder the dependence of the generalization bound on the desired confidence level. Furthermore, we use tools from binary hypothesis testing to derive a second bound, which is explicit in the tail of the information density. This bound confirms that a fast decay of the tail of the information density yields a more favorable dependence of the generalization bound on the confidence level.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Generalization Error Bounds for Deep Variational Inference
Chérief-Abdellatif, Badr-Eddine
Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference. In particular, we highlight the connection between estimation and approximation theories via the classical bias-variance trade-off and show that it leads to near-minimax rates of convergence for H\"older smooth functions. Additionally, we show that the model selection framework over the neural network architecture via ELBO maximization does not overfit and adaptively achieves the optimal rate of convergence.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (7 more...)
- Research Report (0.64)
- Instructional Material (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
Li, Jian, Luo, Xuanyuan, Qiao, Mingda
Generalization error (also known as the out-of-sample error) measures how well the hypothesis obtained from the training data can generalize to previously unseen data. Obtaining tight generalization error bounds is central to statistical learning theory. In this paper, we study the generalization error bound in learning general non-convex objectives, which has attracted significant attention in recent years. In particular, we study the (algorithm-dependent) generalization bounds of various iterative gradient based methods. (1) We present a very simple and elementary proof of a recent result for stochastic gradient Langevin dynamics (SGLD), due to Mou et al. (2018). Our proof can be easily extended to obtain similar generalization bounds for several other variants of SGLD (e.g., with postprocessing, momentum, mini-batch, acceleration, and more general noises), and improves upon the recent results in Pensia et al. (2018). (2) By incorporating ideas from the PAC-Bayesian theory into the stability framework, we obtain tighter distribution-dependent (or data-dependent) generalization bounds. Our bounds provide an intuitive explanation for the phenomenon reported in Zhang et al. (2017a). (3) We also study the setting where the total loss is the sum of a bounded loss and an additional `l2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by leveraging the tool of Log-Sobolev inequality. Our new bounds are more desirable when the noisy level of the process is not small, and do not grow when T approaches to infinity.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (3 more...)