AITopics | optimal approximation

Collaborating Authors

optimal approximation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal approximation using complex-valued neural networks

Neural Information Processing SystemsDec-23-2025, 18:33:26 GMT

Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundation is still largely lacking in the complex-valued case. We thus analyze the expressivity of CVNNs by studying their approximation properties. Our results yield the first quantitative approximation bounds for CVNNs that apply to a wide class of activation functions including the popular modReLU and complex cardioid activation functions. Precisely, our results apply to any activation function that is smooth but not polyharmonic on some non-empty open set; this is the natural generalization of the class of smooth and non-polynomial activation functions to the complex setting. Our main result shows that the approximation error scales as $m^{-k/(2n)}$ for $m \to \infty$ where $m$ is the number of neurons, $k$ the smoothness of the target function and $n$ is the (complex) input dimension. Under a natural continuity assumption, we show that this rate is optimal; we further discuss the optimality when dropping this assumption. Moreover, we prove that the problem of approximating $C^k$-functions using continuous approximation methods unavoidably suffers from the curse of dimensionality.

activation function, neural network, optimal approximation, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

Amsel, Noah, Persson, David, Musco, Christopher, Gower, Robert M.

arXiv.org Artificial IntelligenceSep-29-2025

Computing the polar decomposition and the related matrix sign function has been a well-studied problem in numerical analysis for decades. Recently, it has emerged as an important subroutine within the Muon algorithm for training deep neural networks. However, the requirements of this application differ sharply from classical settings: deep learning demands GPU-friendly algorithms that prioritize high throughput over high precision. We introduce Polar Express, a new method for computing the polar decomposition. Like Newton-Schulz and other classical polynomial methods, our approach uses only matrix-matrix multiplications, making it very efficient on GPUs. Inspired by earlier work of Chen & Chow and Nakatsukasa & Freund, Polar Express adapts the update rule at each iteration by solving a minimax optimization problem. We prove that this strategy minimizes error in a worst-case sense, allowing Polar Express to converge as rapidly as possible both in the early iterations and asymptotically. We also address finite-precision issues, making it practical to use in bfloat16. When integrated into the Muon training framework, our method leads to consistent improvements in validation loss when training a GPT-2 model on one billion tokens from the FineWeb dataset, outperforming recent alternatives across a range of learning rates.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.16932

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

Neural Information Processing SystemsMay-27-2025, 21:57:18 GMT

Approximating the action of a matrix function f(\vec{A}) on a vector \vec{b} is an increasingly important primitive in machine learning, data science, and statistics, with applications such as sampling high dimensional Gaussians, Gaussian process regression and Bayesian inference, principle component analysis, and approximating Hessian spectral densities.Over the past decade, a number of algorithms enjoying strong theoretical guarantees have been proposed for this task.Many of the most successful belong to a family of algorithms called Krylov subspace methods.Remarkably, a classic Krylov subspace method, called the Lanczos method for matrix functions (Lanczos-FA), frequently outperforms newer methods in practice. Our main result is a theoretical justification for this finding: we show that, for a natural class of rational functions, Lanczos-FA matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor. The approximation factor depends on the degree of f(x) 's denominator and the condition number of \vec{A}, but not on the number of iterations k . Our result provides a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals, such as the matrix square root.

artificial intelligence, krylov subspace method, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Review for NeurIPS paper: Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions

Neural Information Processing SystemsJan-22-2025, 07:09:09 GMT

Summary and Contributions: A soft-max function has two main efficiency measures, approximation and smoothness. Authors goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. They introduce a soft-max function, called piece-wise linear soft-max, with optimal tradeoff between approximation measured in terms of worst-case additive approximation, and smoothness measured with respect to l -norm. The worst-case approximation guarantee of the piece-wise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications and is not satisfied by the exponential mechanism. Finally, they investigate another soft-max function, called power mechanism, with optimal tradeoff between expected multiplicative approximation and smoothness with respect to the Rényi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

approximation and smoothness, optimal approximation, soft-max function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Review for NeurIPS paper: Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions

Neural Information Processing SystemsJan-22-2025, 07:09:02 GMT

This paper studies trade-offs between approximation quality and smoothness of "soft"-max functions. The natural exponential function has optimal tradeoff between expected additive approximation and smoothness measured with respect to Renyi divergence but suboptimal when measured via L_p norms. The authors present a new the piecewise linear function which is the optimal one for norms. The reviewers found this to be an well-written and thorough paper on an important problem of broad interest in machine learning. I recommend this paper for acceptance.

optimal approximation, smoothness tradeoff, soft-max function, (2 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Optimal approximation using complex-valued neural networks

Neural Information Processing SystemsOct-9-2024, 09:51:09 GMT

activation function, complex-valued neural network, neural network, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Polynomial-time derivation of optimal k-tree topology from Markov networks

Dastjerdi, Fereshteh R., Cai, Liming

arXiv.org Machine LearningApr-8-2024

Characterization of joint probability distribution for large networks of random variables remains a challenging task in data science. Probabilistic graph approximation with simple topologies has practically been resorted to; typically the tree topology makes joint probability computation much simpler and can be effective for statistical inference on insufficient data. However, to characterize network components where multiple variables cooperate closely to influence others, model topologies beyond a tree are needed, which unfortunately are infeasible to acquire. In particular, our previous work has related optimal approximation of Markov networks of tree-width k >=2 closely to the graph-theoretic problem of finding maximum spanning k-tree (MSkT), which is a provably intractable task. This paper investigates optimal approximation of Markov networks with k-tree topology that retains some designated underlying subgraph. Such a subgraph may encode certain background information that arises in scientific applications, for example, about a known significant pathway in gene networks or the indispensable backbone connectivity in the residue interaction graphs for a biomolecule 3D structure. In particular, it is proved that the \beta-retaining MSkT problem, for a number of classes \beta of graphs, admit O(n^{k+1})-time algorithms for every fixed k>= 1. These \beta-retaining MSkT algorithms offer efficient solutions for approximation of Markov networks with k-tree topology in the situation where certain persistent information needs to be retained.

approximation, graph, vertex, (17 more...)

arXiv.org Machine Learning

2404.05991

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Clustering what Matters: Optimal Approximation for Clustering with Outliers

Agrawal, Akanksha (Indian Institute of Technology, Madras) | Inamdar, Tanmay (a:1:{s:5:"en_US";s:20:"University of Bergen";}) | Saurabh, Saket (Institute of Mathematical Sciences, Chennai, India) | Xue, Jie (NYU Shanghai)

Journal of Artificial Intelligence ResearchSep-15-2023

Clustering with outliers is one of the most fundamental problems in Computer Science. Given a set X of n points and two numbers k, m, the clustering with outliers aims to exclude m points from X and partition the remaining points into k clusters that minimizes a certain cost function. In this paper, we give a general approach for solving clustering with outliers, which results in a fixed-parameter tractable (FPT) algorithm in k and m—i.e., an algorithm with running time of the form f(k, m) · nO(1) for some function f—that almost matches the approximation ratio for its outlier-free counterpart. As a corollary, we obtain FPT approximation algorithms with optimal approximation ratios for k-Median and k-Means with outliers in general and Euclidean metrics. We also exhibit more applications of our approach to other variants of the problem that impose additional constraints on the clustering, such as fairness or matroid constraints.

algorithm, diam, outlier, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14883

AI Access Foundation

14883

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Shanghai > Shanghai (0.04)
(12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Conjugate Natural Selection

Raab, Reilly, de Alfaro, Luca, Liu, Yang

arXiv.org Artificial IntelligenceJun-12-2023

Evolution describes how distributions change. Specifically, evolution provides a model for how a population's distribution of traits or strategies (hereafter hypotheses) changes over time as an environment modulates reproduction rates (i.e., of individuals or of hypotheses; Lloyd, 2020): Hypotheses that have higher fitness are "selected" by the environment and, in expectation, become more popular with time. The replicator equation is a formal, analytic model of evolution and is indispensable to biology (Sinervo and Calsbeek, 2006; Queller, 2017). In the replicator equation, the absolute fitness (in this paper, the negative loss L) of hypotheses h H is identified with its rate of replication: exponential growth (or decline) in a population where different hypotheses compete for relative frequency ρ(h) [0, 1]. For probability distributions over hypothesis space H, this equation induces replicator dynamics, selecting hypotheses with lower than average loss.

evolutionary algorithm, machine learning, replicator equation, (14 more...)

arXiv.org Artificial Intelligence

2208.13898

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Optimal Approximation with Sparse Neural Networks and Applications

Hong, Khay Boon

arXiv.org Artificial IntelligenceAug-14-2021

We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(\mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We then prove the fundamental bound theorem, implying a quantity intrinsic to the function class itself can give information about the approximation ability of neural networks and representation system. We also provides a method for transferring existing theories about approximation by representation systems to that of neural networks, greatly amplifying the practical values of neural networks. Finally, we use neural networks to approximate B-spline functions, which are used to generate the B-spline curves. Then, we analyse the complexity of a class called $\beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.

approximation, neural network, representation system, (14 more...)

arXiv.org Artificial Intelligence

2108.06467

Country:

North America > United States > New York (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback