AITopics | Steinwart, Ingo

Collaborating Authors

Steinwart, Ingo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

Holzmüller, David, Grinsztajn, Léo, Steinwart, Ingo

arXiv.org Artificial IntelligenceJul-5-2024

For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) improved default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 71 classification and 47 regression datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 48 classification and 42 regression datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results show that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K--500K samples) without hyperparameter tuning.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.04491

Country:

Asia > China (0.67)
Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Conditioning of Banach Space Valued Gaussian Random Variables: An Approximation Approach Based on Martingales

Steinwart, Ingo

arXiv.org Artificial IntelligenceApr-4-2024

In this paper we investigate the conditional distributions of two Banach space valued, jointly Gaussian random variables. These conditional distributions are again Gaussian and their means and covariances are determined by a general approximation scheme based upon a martingale idea. We then apply our general results to the case of Gaussian processes with continuous paths conditioned to partial observations of their paths.

artificial intelligence, cov, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.03453

Country:

Europe (0.46)
Asia (0.45)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers

Pförtner, Marvin, Steinwart, Ingo, Hennig, Philipp, Wenger, Jonathan

arXiv.org Machine LearningDec-2-2023

Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

2212.12474

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

Haas, Moritz, Holzmüller, David, von Luxburg, Ulrike, Steinwart, Ingo

arXiv.org Machine LearningOct-26-2023

The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2305.14077

Country:

North America > United States (0.45)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Framework and Benchmark for Deep Batch Active Learning for Regression

Holzmüller, David, Zaverkin, Viktor, Kästner, Johannes, Steinwart, Ingo

arXiv.org Artificial IntelligenceAug-1-2023

The acquisition of labels for supervised learning can be expensive. To improve the sample efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width neural tangent kernels and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2203.0941

Country:

Europe > Germany > Baden-Württemberg (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Utilizing Expert Features for Contrastive Learning of Time-Series Representations

Nonnenmacher, Manuel, Oldenburg, Lukas, Steinwart, Ingo, Reeb, David

arXiv.org Machine LearningJun-23-2022

We present an approach that incorporates expert knowledge for time-series representation learning. Our method employs expert features to replace the commonly used data transformations in previous contrastive learning approaches. We do this since time-series data frequently stems from the industrial or medical field where expert features are often available from domain experts, while transformations are generally elusive for time-series data. We start by proposing two properties that useful time-series representations should fulfill and show that current representation learning approaches do not ensure these properties. We therefore devise ExpCLR, a novel contrastive learning approach built on an objective that utilizes expert features to encourage both properties for the learned representation. Finally, we demonstrate on three real-world time-series datasets that ExpCLR surpasses several state-of-the-art methods for both unsupervised and semi-supervised representation learning.

data mining, expclr, machine learning, (16 more...)

arXiv.org Machine Learning

2206.11517

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

Nonnenmacher, Manuel, Pfeil, Thomas, Steinwart, Ingo, Reeb, David

arXiv.org Machine LearningOct-19-2021

Pruning neural networks reduces inference time and memory costs. On standard hardware, these benefits will be especially prominent if coarse-grained structures, like feature maps, are pruned. We devise two novel saliency-based methods for second-order structured pruning (SOSP) which include correlations among all structures and layers. Our main method SOSP-H employs an innovative second-order approximation, which enables saliency evaluations by fast Hessian-vector products. We validate SOSP-H by comparing it to our second method SOSP-I that uses a well-established Hessian approximation, and to numerous state-of-the-art methods. While SOSP-H performs on par or better in terms of accuracy, it has clear advantages in terms of scalability and efficiency. This allowed us to scale SOSP-H to large-scale vision tasks, even though it captures correlations across all layers of the network. To underscore the global nature of our pruning methods, we evaluate their performance not only by removing structures from a pretrained network, but also by detecting architectural bottlenecks. We show that our algorithms allow to systematically reveal architectural bottlenecks, which we then remove to further increase the accuracy of the networks. Deep neural networks have consistently grown in size over the last years with increasing performance. However, this increase in size leads to slower inference, higher computational requirements and higher cost. To reduce the size of the networks without affecting their performance, a large number of pruning algorithms have been proposed (e.g., LeCun et al., 1990; Hassibi et al., 1993; Reed, 1993; Han et al., 2015; Blalock et al., 2020). Pruning can either be unstructured, i.e. removing individual weights, or structured, i.e. removing entire substructures like nodes or channels.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2110.11395

Country: Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments

Zaverkin, Viktor, Holzmüller, David, Steinwart, Ingo, Kästner, Johannes

arXiv.org Machine LearningSep-20-2021

Approximate methods, such as empirical force fields (FFs) [1-3], are an integral part of modern computational chemistry and materials science. While the application of first-principles methods, such as density functional theory (DFT), to even moderately sized molecular and material systems is computationally very expensive, approximate methods allow for simulations of large systems over long time scales. During the last decades, machine-learned potentials (MLPs) [4-33] have risen in popularity due to their ability to be as accurate as the respective first principles reference methods, the transferability to arbitrary-sized systems, and the capability of describing bond breaking and bond formation as opposed to empirical FFs [34]. Interpolating abilities of neural networks (NNs) [35] promoted their broad application in computational chemistry and materials science. NNs were initially applied to represent potential energy surfaces (PESs) of small atomistic systems [36, 37] and were later extended to high-dimensional systems [21].

artificial intelligence, neural network, simulation, (17 more...)

arXiv.org Machine Learning

2109.09569

Country:

Europe > Germany > Baden-Württemberg (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)
Energy > Energy Storage (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Which Minimizer Does My Neural Network Converge To?

Nonnenmacher, Manuel, Reeb, David, Steinwart, Ingo

arXiv.org Machine LearningNov-4-2020

The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy to limit this effect. Then, we demonstrate that for adaptive optimization such as AdaGrad, the obtained minimizer generally differs from the gradient descent (GD) minimizer. This adaptive minimizer is changed further by stochastic mini-batch training, even though in the non-adaptive case GD and stochastic GD result in essentially the same minimizer. Lastly, we explain that these effects remain relevant for less overparameterized NNs. While overparameterization has its benefits, our work highlights that it induces sources of error absent from underparameterized models, some of which can be challenging to control.

deep learning, initialization, neural network, (17 more...)

arXiv.org Machine Learning

2011.02408

Country:

North America (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Optimal learning rates for least squares SVMs using Gaussian kernels

Eberts, Mona, Steinwart, Ingo

Neural Information Processing SystemsFeb-14-2020, 22:59:09 GMT

We prove a new oracle inequality for support vector machines with Gaussian RBF kernels solving the regularized least squares regression problem. To this end, we apply the modulus of smoothness. With the help of the new oracle inequality we then derive learning rates that can also be achieved by a simple data-dependent parameter selection method. Finally, it turns out that our learning rates are asymptotically optimal for regression functions satisfying certain standard smoothness conditions. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, gaussian kernel, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.74)

Add feedback