AITopics | Mhaskar, H. N.

Collaborating Authors

Mhaskar, H. N.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning on manifolds without manifold learning

Mhaskar, H. N., O'Dowd, Ryan

arXiv.org Machine LearningFeb-19-2024

Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. In contrast to the prevalent paradigm of solving this problem by minimizing a loss functional, we have given a direct one-shot construction together with optimal error bounds under the manifold assumption; i.e., one assumes that the data is sampled from an unknown sub-manifold of a high dimensional Euclidean space. A great deal of research deals with obtaining information about this manifold, such as the eigendecomposition of the Laplace-Beltrami operator or coordinate charts, and using this information for function approximation. This two step approach implies some extra errors in the approximation stemming from basic quantities of the data in addition to the errors inherent in function approximation. In Neural Networks, 132:253268, 2020, we have proposed a one-shot direct method to achieve function approximation without requiring the extraction of any information about the manifold other than its dimension. However, one cannot pin down the class of approximants used in that paper. In this paper, we view the unknown manifold as a sub-manifold of an ambient hypersphere and study the question of constructing a one-shot approximation using the spherical polynomials based on the hypersphere. Our approach does not require preprocessing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively "rough" functions.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2402.12687

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.95)

Add feedback

Local transfer learning from one data space to another

Mhaskar, H. N., O'Dowd, Ryan

arXiv.org Artificial IntelligenceJul-7-2023

A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.

artificial intelligence, data space, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.0016

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.81)

Add feedback

Super-resolution meets machine learning: approximation of measures

Mhaskar, H. N.

arXiv.org Machine LearningJul-10-2019

The problem of super-resolution in general terms is to recuperate a finitely supported measure $\mu$ given finitely many of its coefficients $\hat{\mu}(k)$ with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of $\mu$. In this paper, we consider the more severe problem of recuperating $\mu$ approximately without any assumption on $\mu$ beyond having a finite total variation. In particular, $\mu$ may be supported on a continuum, so that the minimal separation among the points in the support of $\mu$ is $0$. A variant of this problem is also of interest in machine learning as well as the inverse problem of de-convolution. We define an appropriate notion of a distance between the target measure and its recuperated version, give an explicit expression for the recuperation operator, and estimate the distance between $\mu$ and its approximation. We show that these estimates are the best possible in many different ways. We also explain why for a finitely supported measure the approximation quality of its recuperation is bounded from below if the amount of information is smaller than what is demanded in the super-resolution problem.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1907.04895

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Function approximation by deep networks

Mhaskar, H. N., Poggio, T.

arXiv.org Machine LearningMay-30-2019

We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift' theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

approximation, fuzzy logic, neural network, (18 more...)

arXiv.org Machine Learning

1905.12882

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

A witness function based construction of discriminative models using Hermite polynomials

Mhaskar, H. N., Cloninger, A., Cheng, X.

arXiv.org Machine LearningJan-9-2019

In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $\mu$; the marginal distribution for the $\mathbf{x}_j$'s being $\mu^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a `witness function' in classification problems. Thus, if the value of this estimator at a point $\mathbf{x}$ exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.

health & medicine, neural network, witness function, (19 more...)

arXiv.org Machine Learning

1901.02975

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Choose an Activation Function

Mhaskar, H. N., Micchelli, C. A..

Neural Information Processing SystemsDec-31-1994

We study the complexity problem in artificial feedforward neural networks designed to approximate real valued functions of several real variables; i.e., we estimate the number of neurons in a network required to ensure a given degree of approximation to every function in a given function class. We indicate how to construct networks with the indicated number of neurons evaluating standard activation functions. Our general theorem shows that the smoother the activation function, the better the rate of approximation. 1 INTRODUCTION The approximation capabilities of feedforward neural networks with a single hidden layer has been studied by many authors, e.g., [1, 2, 5]. In [10], we have shown that such a network using practically any nonlinear activation function can approximate any continuous function of any number of real variables on any compact set to any desired degree of accuracy. A central question in this theory is the following.

approximation, artificial intelligence, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > India > NCT (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Choose an Activation Function

Mhaskar, H. N., Micchelli, C. A..

Neural Information Processing SystemsDec-31-1994

In [10], we have shown that such a network using practically any nonlinear activation function can approximate any continuous function of any number of real variables on any compact set to any desired degree of accuracy. A central question in this theory is the following. If one needs to approximate a function from a known class of functions to a prescribed accuracy, how many neurons will be necessary to accomplish this approximation for all functions in the class?

approximation, artificial intelligence, neural network, (16 more...)

Neural Information Processing Systems

Country: