AITopics | Unser, Michael

Collaborating Authors

Unser, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

Neumayer, Sebastian, Chizat, Lénaïc, Unser, Michael

arXiv.org Artificial IntelligenceAug-9-2023

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal-transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite-dimensional convex counterpart. We formulate the corresponding functional-optimization problem and investigate its main properties. In particular, we show that, as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. Numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly, even beyond these extreme points.

artificial intelligence, machine learning, neural information processing system, (14 more...)

arXiv.org Artificial Intelligence

2303.17805

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Stability of Image-Reconstruction Algorithms

Pla, Pol del Aguila, Neumayer, Sebastian, Unser, Michael

arXiv.org Artificial IntelligenceJan-20-2023

Robustness and stability of image-reconstruction algorithms have recently come under scrutiny. Their importance to medical imaging cannot be overstated. We review the known results for the topical variational regularization strategies ($\ell_2$ and $\ell_1$ regularization) and present novel stability results for $\ell_p$-regularized linear inverse problems for $p\in(1,\infty)$. Our results guarantee Lipschitz continuity for small $p$ and H\"{o}lder continuity for larger $p$. They generalize well to the $L_p(\Omega)$ function spaces.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Artificial Intelligence

2206.07128

Country:

Europe (0.46)
North America (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Asymptotic Stability in Reservoir Computing

Dong, Jonathan, Börve, Erik, Rafayelyan, Mushegh, Unser, Michael

arXiv.org Machine LearningJun-7-2022

Reservoir Computing is a class of Recurrent Neural Networks with internal weights fixed at random. Stability relates to the sensitivity of the network state to perturbations. It is an important property in Reservoir Computing as it directly impacts performance. In practice, it is desirable to stay in a stable regime, where the effect of perturbations does not explode exponentially, but also close to the chaotic frontier where reservoir dynamics are rich. Open questions remain today regarding input regularization and discontinuous activation functions. In this work, we use the recurrent kernel limit to draw new insights on stability in reservoir computing. This limit corresponds to large reservoir sizes, and it already becomes relevant for reservoirs with a few hundred neurons. We obtain a quantitative characterization of the frontier between stability and chaos, which can greatly benefit hyperparameter tuning. In a broader sense, our results contribute to understanding the complex dynamics of Recurrent Neural Networks.

artificial intelligence, asymptotic stability, machine learning, (1 more...)

arXiv.org Machine Learning

2206.03854

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measuring Complexity of Learning Schemes Using Hessian-Schatten Total-Variation

Aziznejad, Shayan, Campos, Joaquim, Unser, Michael

arXiv.org Machine LearningDec-12-2021

In this paper, we introduce the Hessian-Schatten total-variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed-norms. We then show that HTV is invariant to rotations, scalings, and translations. Additionally, its minimum value is achieved for linear mappings, supporting the common intuition that linear regression is the least complex learning model. Next, we present closed-form expressions for computing the HTV of two general classes of functions. The first one is the class of Sobolev functions with a certain degree of regularity, for which we show that HTV coincides with the Hessian-Schatten seminorm that is sometimes used as a regularizer for image reconstruction. The second one is the class of continuous and piecewise linear (CPWL) functions. In this case, we show that the HTV reflects the total change in slopes between linear regions that have a common facet. Hence, it can be viewed as a convex relaxation (l1-type) of the number of linear regions (l0-type) of CPWL mappings. Finally, we illustrate the use of our proposed seminorm with some concrete examples.

artificial intelligence, diagnostic medicine, machine learning, (18 more...)

arXiv.org Machine Learning

2112.06209

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Native Banach spaces for splines and variational inverse problems

Unser, Michael, Fageot, Julien

arXiv.org Machine LearningApr-24-2019

We propose a systematic construction of native Banach spaces for general spline-admissible operators ${\rm L}$. In short, the native space for ${\rm L}$ and the (dual) norm $\|\cdot\|_{\mathcal{X}'}$ is the largest space of functions $f: \mathbb{R}^d \to \mathbb{R}$ such that $\|{\rm L} f\|_{\mathcal{X}'}<\infty$, subject to the constraint that the growth-restricted null space of ${\rm L}$be finite-dimensional. This space, denoted by $\mathcal{X}'_{\rm L}$, is specified as the dual of the pre-native space $\mathcal{X}_{\rm L}$, which is itself obtained through a suitable completion process. The main difference with prior constructions (e.g., reproducing kernel Hilbert spaces) is that our approach involves test functions rather than sums of atoms (e.g, kernels), which makes it applicable to a much broader class of norms, including total variation. Under specific admissibility and compatibility hypotheses, we lay out the direct-sum topology of $\mathcal{X}_{\rm L}$ and $\mathcal{X}'_{\rm L}$, and identify the whole family of equivalent norms. Our construction ensures that the native space and its pre-dual are endowed with a fundamental Schwartz-Banach property. In practical terms, this means that $\mathcal{X}'_{\rm L}$ is rich enough to reproduce any function with an arbitrary degree of precision.

artificial intelligence, machine learning, proj, (19 more...)

arXiv.org Machine Learning

1904.10818

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A unifying representer theorem for inverse problems and machine learning

Unser, Michael

arXiv.org Machine LearningMar-2-2019

The standard approach for dealing with the ill-posedness of the training problem in machine learning and/or the reconstruction of a signal from a limited number of measurements is regularization. The method is applicable whenever the problem is formulated as an optimization task. The standard strategy consists in augmenting the original cost functional by an energy that penalizes solutions with undesirable behavior. The effect of regularization is very well understood when the penalty involves a Hilbertian norm. Another popular configuration is the use of an $\ell_1$-norm (or some variant thereof) that favors sparse solutions. In this paper, we propose a higher-level formulation of regularization within the context of Banach spaces. We present a general representer theorem that characterizes the solutions of a remarkably broad class of optimization problems. We then use our theorem to retrieve a number of known results in the literature---e.g., the celebrated representer theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, the recovery of spikes---as well as a few new ones.

artificial intelligence, machine learning, representer theorem, (16 more...)

arXiv.org Machine Learning

1903.00687

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An L1 Representer Theorem for Multiple-Kernel Regression

Aziznejad, Shayan, Unser, Michael

arXiv.org Machine LearningNov-7-2018

The theory of RKHS provides an elegant framework for supervised learning. It is the foundation of all kernel methods in machine learning. Implicit in its formulation is the use of a quadratic regularizer associated with the underlying inner product which imposes smoothness constraints. In this paper, we consider instead the generalized total-variation (gTV) norm as the sparsity-promoting regularizer. This leads us to propose a new Banach-space framework that justifies the use of generalized LASSO, albeit in a slightly modified version. We prove a representer theorem for multiple-kernel regression (MKR) with gTV regularization. The theorem states that the solutions of MKR have kernel expansions with adaptive positions, while the gTV norm enforces an $\ell_1$ penalty on the coefficients. We discuss the sparsity-promoting effect of the gTV norm which prevents redundancy in the multiple-kernel scenario.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Machine Learning

1811.00836

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.49)

Add feedback

A representer theorem for deep neural networks

Unser, Michael

arXiv.org Machine LearningFeb-26-2018

We propose to optimize the activation functions of a deep neural network by adding a corresponding functional regularization to the cost function. We justify the use of a second-order total-variation criterion. This allows us to derive a general representer theorem for deep neural networks that makes a direct connection with splines and sparsity. Specifically, we show that the optimal network configuration can be achieved with activation functions that are nonuniform linear splines with adaptive knots. The bottom line is that the action of each neuron is encoded by a spline whose parameters (including the number of knots) are optimized during the training procedure. The scheme results in a computational structure that is compatible with the existing deep-ReLU and MaxOut architectures. It also suggests novel optimization challenges, while making the link with $\ell_1$ minimization and sparsity-promoting techniques explicit.

deep learning, neural network, télécommunications, (17 more...)

arXiv.org Machine Learning

1802.0921

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.50)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Convex Regularizers for Optimal Bayesian Denoising

Nguyen, Ha Q., Bostan, Emrah, Unser, Michael

arXiv.org Machine LearningMay-16-2017

We propose a data-driven algorithm for the maximum a posteriori (MAP) estimation of stochastic processes from noisy observations. The primary statistical properties of the sought signal is specified by the penalty function (i.e., negative logarithm of the prior probability density function). Our alternating direction method of multipliers (ADMM)-based approach translates the estimation task into successive applications of the proximal mapping of the penalty function. Capitalizing on this direct link, we define the proximal operator as a parametric spline curve and optimize the spline coefficients by minimizing the average reconstruction error for a given training set. The key aspects of our learning method are that the associated penalty function is constrained to be convex and the convergence of the ADMM iterations is proven. As a result of these theoretical guarantees, adaptation of the proposed framework to different levels of measurement noise is extremely simple and does not require any retraining. We apply our method to estimation of both sparse and non-sparse models of L\'{e}vy processes for which the minimum mean square error (MMSE) estimators are available. We carry out a single training session and perform comparisons at various signal-to-noise ratio (SNR) values. Simulations illustrate that the performance of our algorithm is practically identical to the one of the MMSE estimator irrespective of the noise power.

artificial intelligence, bayesian inference, shrinkage function, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2777407

1705.05591

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Kamilov, Ulugbek, Rangan, Sundeep, Unser, Michael, Fletcher, Alyson K.

Neural Information Processing SystemsDec-31-2012

We consider the estimation of an i.i.d.\ vector $\xbf \in \R^n$ from measurements $\ybf \in \R^m$ obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector $\xbf$. The proposed algorithm is a generalization of a recently-developed method by Vila and Schniter that uses expectation-maximization (EM) iterations where the posteriors in the E-steps are computed via approximate message passing. The techniques can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d.\ Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees.

algorithm, artificial intelligence, bayesian inference, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback