AITopics | Rivasplata, Omar

Collaborating Authors

Rivasplata, Omar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A note on generalization bounds for losses with finite moments

Rodríguez-Gálvez, Borja, Rivasplata, Omar, Thobaben, Ragnar, Skoglund, Mikael

arXiv.org Machine LearningMar-25-2024

This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.

artificial intelligence, machine learning, pac-bayes, (18 more...)

arXiv.org Machine Learning

2403.16681

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

A Note on the Convergence of Denoising Diffusion Probabilistic Models

Mbacke, Sokhna Diarra, Rivasplata, Omar

arXiv.org Artificial IntelligenceDec-10-2023

Diffusion models are one of the most important families of deep generative models. In this note, we derive a quantitative upper bound on the Wasserstein distance between the data-generating distribution and the distribution learned by a diffusion model. Unlike previous works in this field, our result does not make assumptions on the learned score function. Moreover, our bound holds for arbitrary data-generating distributions on bounded instance spaces, even those without a density w.r.t. the Lebesgue measure, and the upper bound does not suffer from exponential dependencies. Our main result builds upon the recent work of Mbacke et al. (2023) and our proofs are elementary.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.05989

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Role of Optimization in Double Descent: A Least Squares Study

Kuzborskij, Ilja, Szepesvári, Csaba, Rivasplata, Omar, Rannen-Triki, Amal, Pascanu, Razvan

arXiv.org Machine LearningJul-27-2021

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations.

deep learning, eigenvalue, neural network, (19 more...)

arXiv.org Machine Learning

2107.12685

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Upper and Lower Bounds on the Performance of Kernel PCA

Haddouche, Maxime, Guedj, Benjamin, Rivasplata, Omar, Shawe-Taylor, John

arXiv.org Machine LearningDec-18-2020

Principal Component Analysis (PCA) is a popular method for dimension reduction and has attracted an unfailing interest for decades. Recently, kernel PCA has emerged as an extension of PCA but, despite its use in practice, a sound theoretical understanding of kernel PCA is missing. In this paper, we contribute lower and upper bounds on the efficiency of kernel PCA, involving the empirical eigenvalues of the kernel Gram matrix. Two bounds are for fixed estimators, and two are for randomized estimators through the PAC-Bayes theory. We control how much information is captured by kernel PCA on average, and we dissect the bounds to highlight strengths and limitations of the kernel PCA algorithm. Therefore, we contribute to the better understanding of kernel PCA. Our bounds are briefly illustrated on a toy numerical example.

artificial intelligence, kernel pca, machine learning, (15 more...)

arXiv.org Machine Learning

2012.10369

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Logarithmic Pruning is All You Need

Orseau, Laurent, Hutter, Marcus, Rivasplata, Omar

arXiv.org Machine LearningOct-25-2020

The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network. This latter result, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds: the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork.

deep learning, neural network, neuron, (20 more...)

arXiv.org Machine Learning

2006.12156

Country: North America > Canada (0.14)

Genre:

Research Report (0.40)
Contests & Prizes (0.34)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Communications > Networks (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PAC-Bayes Analysis Beyond the Usual Bounds

Rivasplata, Omar, Kuzborskij, Ilja, Szepesvari, Csaba, Shawe-Taylor, John

arXiv.org Machine LearningOct-24-2020

We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed 'data-free' priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.

artificial intelligence, bayesian inference, inequality, (16 more...)

arXiv.org Machine Learning

2006.13057

Country:

North America > Canada (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

PAC-Bayes unleashed: generalisation bounds with unbounded losses

Haddouche, Maxime, Guedj, Benjamin, Rivasplata, Omar, Shawe-Taylor, John

arXiv.org Machine LearningSep-30-2020

Since its emergence in the late 90s, the PAC-Bayes theory (see the seminal papers by Shawe-Taylor and Williamson, 1997 and McAllester, 1998, 1999, or the recent survey by Guedj, 2019) has been a powerful tool to obtain generalisation bounds and derive efficient learning algorithms. PAC-Bayes bounds were originally meant for binary classification problems (Seeger, 2002; Langford, 2005; Catoni, 2007) but the literature now includes many contributions involving any bounded loss function (without loss of generality, with values in r0; 1s), not just the binary loss. Generalisation bounds are helpful to ensure that a learning algorithm will have a good performance on future similar batches of data. Our goal is to provide new PAC-Bayesian generalisation bounds holding for unbounded loss functions, and thus extend the usability of PAC-Bayes to a much larger class of learning problems. Some ways to circumvent the bounded range assumption on the losses have been addressed in the recent literature.

artificial intelligence, machine learning, proposition 4, (17 more...)

arXiv.org Machine Learning

2006.07279

Country:

Europe > United Kingdom > England (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Tighter risk certificates for neural networks

Pérez-Ortiz, María, Rivasplata, Omar, Shawe-Taylor, John, Szepesvári, Csaba

arXiv.org Machine LearningAug-12-2020

This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates that are valid on any unseen examples for the learnt predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols.

deep learning, neural network, risk certificate, (19 more...)

arXiv.org Machine Learning

2007.12911

Country:

Europe > United Kingdom > England (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

PAC-Bayes with Backprop

Rivasplata, Omar, Tankasali, Vikram M, Szepesvari, Csaba

arXiv.org Machine LearningAug-23-2019

We explore a method to train probabilistic neural networks by minimizing risk upper bounds, specifically, PAC-Bayes bounds. Thus randomization is not just part of a proof strategy, but part of the learning algorithm itself. We derive two training objectives, one from a previously known PAC-Bayes bound, and a second one from a novel PAC-Bayes bound. We evaluate both training objectives on various data sets and demonstrate the tightness of the risk upper bounds achieved by our method. Our training objectives have sound theoretical justification, and lead to self-bounding learning where all the available data may be used to learn a predictor and certify its risk, with no need to follow a data-splitting protocol.

backprop

arXiv.org Machine Learning

1908.0738

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

PAC-Bayes bounds for stable algorithms with instance-dependent priors

Rivasplata, Omar, Szepesvari, Csaba, Shawe-Taylor, John S., Parrado-Hernandez, Emilio, Sun, Shiliang

Neural Information Processing SystemsDec-31-2018

PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients. We also provide a new bound for the SVM classifier, which is compared to other known bounds experimentally. Ours appears to be the first uniform hypothesis stability-based bound that evaluates to non-trivial values.

artificial intelligence, machine learning, stability, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

Add feedback