AITopics | Dirksen, Sjoerd

Collaborating Authors

Dirksen, Sjoerd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Memorization with neural nets: going beyond the worst case

Dirksen, Sjoerd, Finke, Patrick, Genzel, Martin

arXiv.org Machine LearningOct-12-2023

In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

2310.00327

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

The Separation Capacity of Random Neural Networks

Dirksen, Sjoerd, Genzel, Martin, Jacques, Laurent, Stollenwerk, Alexander

arXiv.org Artificial IntelligenceNov-28-2022

Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes $\mathcal{X}^-, \mathcal{X}^+ \subset \mathbb{R}^d$ (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets $\mathcal{X}^-, \mathcal{X}^+$ and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2108.00207

Country: Europe (1.00)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Statistical post-processing of wind speed forecasts using convolutional neural networks

Veldkamp, Simon, Whan, Kirien, Dirksen, Sjoerd, Schmeits, Maurice

arXiv.org Machine LearningJul-8-2020

Current statistical post-processing methods for probabilistic weather forecasting are not capable of using full spatial patterns from the numerical weather prediction (NWP) model. In this paper we incorporate spatial wind speed information by using convolutional neural networks (CNNs) and obtain probabilistic wind speed forecasts in the Netherlands for 48 hours ahead, based on KNMI's Harmonie-Arome NWP model. The CNNs are shown to have higher Brier skill scores for medium to higher wind speeds, as well as a better continuous ranked probability score (CRPS), than fully connected neural networks and quantile regression forests.

artificial intelligence, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2007.04005

Country:

Europe > Netherlands (1.00)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward a unified theory of sparse dimensionality reduction in Euclidean space

Bourgain, Jean, Dirksen, Sjoerd, Nelson, Jelani

arXiv.org Machine LearningAug-25-2015

Let $\Phi\in\mathbb{R}^{m\times n}$ be a sparse Johnson-Lindenstrauss transform [KN14] with $s$ non-zeroes per column. For a subset $T$ of the unit sphere, $\varepsilon\in(0,1/2)$ given, we study settings for $m,s$ required to ensure $$ \mathop{\mathbb{E}}_\Phi \sup_{x\in T} \left|\|\Phi x\|_2^2 - 1 \right| < \varepsilon , $$ i.e. so that $\Phi$ preserves the norm of every $x\in T$ simultaneously and multiplicatively up to $1+\varepsilon$. We introduce a new complexity parameter, which depends on the geometry of $T$, and show that it suffices to choose $s$ and $m$ such that this parameter is small. Our result is a sparse analog of Gordon's theorem, which was concerned with a dense $\Phi$ having i.i.d. Gaussian entries. We qualitatively unify several results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and Fourier-based restricted isometries. Our work also implies new results in using the sparse Johnson-Lindenstrauss transform in numerical linear algebra, classical and model-based compressed sensing, manifold learning, and constrained least squares problems such as the Lasso.

artificial intelligence, logn, survey article, (17 more...)

arXiv.org Machine Learning

1311.2542

Country:

Europe (0.92)
North America > United States > Maryland (0.27)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Dimensionality reduction with subgaussian matrices: a unified theory

Dirksen, Sjoerd

arXiv.org Machine LearningFeb-17-2014

We present a theory for Euclidean dimensionality reduction with subgaussian matrices which unifies several restricted isometry property and Johnson-Lindenstrauss type results obtained earlier for specific data sets. In particular, we recover and, in several cases, improve results for sets of sparse and structured sparse vectors, low-rank matrices and tensors, and smooth manifolds. In addition, we establish a new Johnson-Lindenstrauss embedding for data sets taking the form of an infinite union of subspaces of a Hilbert space.

artificial intelligence, data mining, subspace, (17 more...)

arXiv.org Machine Learning

1402.3973

Country: North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback