AITopics | Belius, David

Collaborating Authors

Belius, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression

Cheng, Tin Sum, Lucchi, Aurelien, Kratsios, Anastasis, Belius, David

arXiv.org Artificial IntelligenceOct-23-2024

This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions. Our contributions are three-fold: 1) we analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel; 2) we demonstrate the validity of the Gaussian Equivalent Property (GEP), which states that the generalization performance of KRR remains the same when the whitened features are replaced by standard Gaussian vectors, thereby shedding light on the success of previous analyzes under the Gaussian Design Assumption; 3) we derive novel bounds that improve over existing bounds across a broad range of setting such as (in)dependent feature vectors and various combinations of eigen-decay rates in the over/underparameterized regimes.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.17796

Country:

North America > Canada > Ontario (0.14)
North America > United States > North Carolina (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

Cheng, Tin Sum, Lucchi, Aurelien, Kratsios, Anastasis, Belius, David

arXiv.org Artificial IntelligenceFeb-5-2024

Kernel regression plays a pivotal role in machine learning since it offers an expressive and rapidly trainable framework for modeling complex relationships in data. In recent years, kernels have regained significance in deep learning theory since many deep neural networks (DNNs) can be understood as converging to certain kernel limits. Its significance has been underscored by its ability to approximate deep neural network (DNN) training under certain conditions, providing a tractable avenue for analytical exploration of test error and robust theoretical guarantees Jacot et al. [2018], Arora et al. [2019], Bordelon et al. [2020]. The adaptability of kernel regression positions it as a crucial tool in various machine learning applications, making it imperative to comprehensively understand its behavior, particularly concerning overfitting. Despite the increasing attention directed towards kernel ridge regression, the existing literature predominantly concentrates on overfitting phenomena in either the high input dimensional regime or the asymptotic regime Liang and Rakhlin [2020], Mei and Montanari [2022], Misiakiewicz [2022], also known as the ultra-high dimensional regime Zou and Zhang [2009], Fan et al. [2009]. Notably, the focus on asymptotic bounds, requiring the input dimension to approach infinity, may not align with the finite nature of real-world datasets and target functions. Similarly, classical Rademacher-based bounds, e.g. Bartlett and Mendelson [2002], require that the weights of the kernel regressor satisfy data-independent a-priori bounds, a restriction that is also not implemented in standard kernel ridge regression algorithms. These mismatches between idealized mathematical assumptions and practical implementation standards necessitate a more nuanced exploration of overfitting in kernel regression in a fixed input dimension.

artificial intelligence, assumption 3, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.01297

Country: North America (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

Cheng, Tin Sum, Lucchi, Aurelien, Dokmanić, Ivan, Kratsios, Anastasis, Belius, David

arXiv.org Machine LearningOct-3-2023

Generalization is a central theme in statistical learning theory. The recent renewed interest in kernel methods, especially in Kernel Ridge Regression (KRR), is largely due to the fact that deep neural network (DNN) training can be approximated using kernels under appropriate conditions Jacot et al. [2018], Arora et al. [2019], Bordelon et al. [2020], in which the test error is more tractable analytically and thus enjoys stronger theoretical guarantees. However, many prior results have been derived under conditions incompatible with practical settings. For instance Liang and Rakhlin [2020], Liu et al. [2021a], Mei et al. [2021], Misiakiewicz [2022] give asymptotic bounds on the KRR test error, which requires the input dimension d to tend to infinity. In reality, the input dimension of the data set and the target function is typically finite.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

2310.00987

Country:

North America (0.14)
Europe (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Injectivity of ReLU networks: perspectives from statistical physics

Maillard, Antoine, Bandeira, Afonso S., Belius, David, Dokmanić, Ivan, Nakajima, Shuta

arXiv.org Artificial IntelligenceFeb-27-2023

When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $\alpha = \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2302.14112

Country:

North America > United States (0.45)
Europe > United Kingdom > England (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Energy > Oil & Gas > Upstream (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.92)

Add feedback

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

Samarin, Maxim, Roth, Volker, Belius, David

arXiv.org Machine LearningJun-24-2020

The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of widths common in practice, trained on complex datasets such as ImageNet. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet, and find that their behavior deviates significantly from their finite-width NTK counterparts. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.

architecture, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2006.13645

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback