AITopics | banach space

Collaborating Authors

banach space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

Schmocker, Philipp, Teichmann, Josef

arXiv.org Machine LearningJun-30-2026

We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2606.0982

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.27)

Genre:

Instructional Material (0.45)
Research Report (0.40)

Industry: Banking & Finance (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

Ongie, Greg, Parhi, Rahul

arXiv.org Machine LearningJun-26-2026

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

artificial intelligence, machine learning, representation cost, (18 more...)

arXiv.org Machine Learning

2606.14954

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis

Neural Information Processing SystemsJun-17-2026, 21:59:17 GMT

This paper explores the use of Hermite transform based reproducing kernel Banach space methods to construct exact or un-approximated models of feedforward neural networks of arbitrary width, depth and topology, including ResNet and Transformers networks, assuming only a feedforward topology, finite energy activations and finite (spectral-) norm weights and biases. Using this model, two straightforward but surprisingly tight bounds on Rademacher complexity are derived, precisely (1) a general bound that is width-independent and scales exponentially with depth; and (2) a width-and depth-independent bound for networks with appropriately constrained (below threshold) weights and biases.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

Hübler, Florian, Pethick, Thomas, Sra, Suvrit

arXiv.org Machine LearningJun-15-2026

Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{Δ_1 L}{\varepsilon^2} \left(\frac σ\varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2606.1456

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Gaussian Processes with Sample Paths in Reproducing Kernel Banach Spaces

Karvonen, Toni, Sørensen, Rasmus Kleist Hørlyck

arXiv.org Machine LearningMay-28-2026

We investigate the connection between Gaussian processes and Gaussian random elements in reproducing kernel Banach spaces. We show that the covariance operator of a weak second-order Radon probability measure on such a space is uniquely determined by a positive definite function. In the Gaussian case, we characterize those positive definite functions that arise from covariance operators in terms of $γ$-radonifying operators. Building on these results, we extend the classical Driscoll theorem to the Banach space setting.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Machine Learning

2605.28106

Country:

North America > United States (0.28)
Europe > Finland (0.28)
Europe > Denmark (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Error Bounds for Learning with Vector-Valued Random Features

S. Lanthaler, N. H. Nelsen

Neural Information Processing SystemsApr-30-2026, 02:18:58 GMT

This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Lipschitz bounds for integral kernels

Reverdi, Justin, Zhang, Sixin, Gamboa, Fabrice, Gratton, Serge

arXiv.org Machine LearningApr-6-2026

Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and apply these results to several important classes of kernels. For infinite width two-layer neural network with isotropic Gaussian weight distributions, we show that the Lipschitz constant of the associated kernel can be expressed as the supremum of a two-dimensional integral, leading to an explicit characterization for the Gaussian kernel and the ReLU random neural network kernel. We also study continuous and shift-invariant kernels such as Gaussian, Laplace, and Matérn kernels, which admit an interpretation as neural network with cosine activation function. In this setting, we prove that the feature map is Lipschitz continuous if and only if the weight distribution has a finite second-order moment, and we then derive its Lipschitz constant. Finally, we raise an open question concerning the asymptotic behavior of the convergence of the Lipschitz constant in finite width neural networks. Numerical experiments are provided to support this behavior.

artificial intelligence, lipschitz constant, machine learning, (15 more...)

arXiv.org Machine Learning

2604.02887

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Greedy Feature Construction

Neural Information Processing SystemsMar-23-2026, 04:07:42 GMT

We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity - large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods.

artificial intelligence, machine learning, sequence, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Filters

Collaborating Authors

banach space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

Gaussian Processes with Sample Paths in Reproducing Kernel Banach Spaces

e9b8a3362a6d9a7f9f842bd2d919e1a0-Paper-Conference.pdf

Error Bounds for Learning with Vector-Valued Random Features

26901debb30ea03f0aa833c9de6b81e9-Supplemental.pdf

Lipschitz bounds for integral kernels

Greedy Feature Construction