AITopics

2605.12977

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsApr-29-2026, 08:40:03 GMT

aa5f224975a67914067519faddeacba3-Supplemental-Conference.pdf

artificial intelligence, machine learning, vec, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsApr-25-2026, 08:10:53 GMT

paper-oras-neurips

ali taghibakhshi

Domain decomposition methods are widely used and effective in the approximation of solutions to partial differential equations. Yet the optimal construction of these methods requires tedious analysis and is often available only in simplified, structured-grid settings, limiting their use for more complex problems. In this work, we generalize optimized Schwarz domain decomposition methods to unstructured-grid problems, using Graph Convolutional Neural Networks (GCNNs) and unsupervised learning to learn optimal modifications at subdomain interfaces. A key ingredient in our approach is an improved loss function, enabling effective training on relatively small problems, but robust performance on arbitrarily large problems, with computational cost linear in problem size. The performance of the learned linear solvers is compared with both classical and optimized domain decomposition algorithms, for both structured-and unstructured-grid problems.

artificial intelligence, machine learning, subdomain, (18 more...)

Country:

North America > Canada > Newfoundland and Labrador (0.28)
North America > United States > Illinois > Champaign County > Urbana (0.15)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-25-2026, 01:02:24 GMT

Appendix details of the proposed method

In this section, we provide further intuition about the proposed AdaQN method. As shown in Figure 5, in AdaQN we need to ensure that the approximate solution of the ERM problem with m samples denoted by wm is within the superlinear convergence neighborhood of BFGS for the ERM problem with n = 2msamples. Here, w m and w n are the optimal solutions of the risks Rm and Rn corresponding to the sets Sm and Sn with mand nsamples, respectively, where Sm Sn. The statistical accuracy region of Rm is denoted by a blue circle, the statistical accuracy region of Rn is denoted by a red circle, and the superlinear convergence neighborhood of BFGS for Rn is denoted by a dotted purple circle. As we observe, any point within the statistical accuracy of w m is within the superlinear convergence neighborhood of BFGS for Rn.

artificial intelligence, machine learning, statistical accuracy, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsApr-25-2026, 00:09:25 GMT

The alignment property of SGD noise and how it helps select flat minima: A stability analysis

The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its linear stability (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum θ is linearly stable for SGD, then it must satisfy H(θ) F O( B/η), where H(θ) F,B,η denote the Frobenius norm of Hessian at θ, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum exponentially fast. Hence, for minima accessible to SGD, the sharpness--as measured by the Frobenius norm of the Hessian--is bounded independently of the model size and sample size. The key to obtaining these results is exploiting the particular structure of SGD noise: The noise concentrates in sharp directions of local landscape and the magnitude is proportional to loss value. This alignment property of SGD noise provably holds for linear networks and random feature models (RFMs), and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are also justified by extensive experiments on CIFAR-10 dataset.

artificial intelligence, machine learning, sgd, (16 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Zhou, Yichen, Tu, Stephen

CLT-Optimal Parameter Error Bounds for Linear System Identification

arXiv.org Machine LearningApr-24-2026

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.

artificial intelligence, assumption 2, machine learning, (18 more...)

2604.2127

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Liu, Tianhao, Díaz-Pachón, Daniel Andrés, Rao, J. Sunil

PRIM-cipal components analysis

arXiv.org Machine LearningApr-20-2026

EVEN supervised learning is subject to the famous NoFree Lunch Theorems [1]-[3], which say that, in combinatorial optimization, there is no universal algorithm that works better than its competitors for every objective function [4]-[6]. Indeed, David Wolpert has recently proven that, on average, cross-validation performs as well as anti-crossvalidation (choosing among a set of candidate algorithms based on which has the worst out-of-sample behavior) for supervised learning. Still, he acknowledges that "it is hard to imagine any scientist who would not prefer to use [crossvalidation] to using anti-cross-validation" [7]. On the other hand, unsupervised learning has seldom been studied from the perspective of the NFLTs. This may be because the adjective "unsupervised" suggests that no human input is needed, which is misleading as many unsupervised tasks are combinatorial optimization problems that depend on the choice of the objective function. For instance, it is well known that, among the eigenvectors of the covariance matrix, Principal Components Analysis selects those with the largest variances [8]. However, mode-hunting techniques that rely on spectral manipulation aim at the opposite objective: selecting the eigenvectors of the covariance matrix with the smallest variances [9], [10]. Therefore, unlike in supervised learning, where it is difficult to identify reasons to optimize with respect to anti-cross-validation, in unsupervised learning there are strong reasons to reduce dimensionality for variance minimization. D. A. D ıaz-Pach on and T. Liu are with the Division of Biostatistics, University of Miami, Miami, FL, 33136 USA (e-mail: ddiaz3@miami.edu,

artificial intelligence, machine learning, principal component, (18 more...)

2604.15538

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Florida > Miami-Dade County > Miami (0.24)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

arXiv.org Machine LearningMar-31-2026

Machine Learning-Assisted High-Dimensional Matrix Estimation

Tian, Wan, Yang, Hui, Lian, Zhouhui, Zhang, Lingyue, Peng, Yijie

Efficient estimation of high-dimensional matrices--including covariance and precision matrices--is a cornerstone of modern multivariate statistics. Most existing studies have focused primarily on the theoretical properties of the estimators (e.g., consistency and sparsity), while largely overlooking the computational challenges inherent in high-dimensional settings. Theoretically, we first prove the convergence of LADMM, and then establish the convergence, convergence rate, and monotonicity of its reparameterized counterpart; importantly, we show that the reparameterized LADMM enjoys a faster convergence rate. Notably, the proposed reparameterization theory and methodology are applicable to the estimation of both high-dimensional covariance and precision matrices. Keywords: ADMM; High-dimensional; Learning-based optimization; Matrix estimation. 1. Introduction High-dimensional matrix estimation--covering both covariance and precision matrix estimation--constitutes a cornerstone of modern statistics and data science [1, 2, 3]. Accurate covariance estimation enables the characterization of dependence structures among a large number of variables [4, 5, 6], which is indispensable in diverse domains such as genomics [7, 8], neuroscience [9], finance [10, 11, 12], and climate science [13, 14]. Over the past two decades, substantial progress has been made in the statistical theory of high-dimensional matrix estimation, particularly with respect to the accuracy of estimators, including properties such as sparsistency and consistency [5, 15, 16]. However, in empirical studies, the dimensionality is often only on the order of tens to hundreds, and in many cases is comparable to the sample size [21, 22, 23, 24]. This observation highlights a notable gap between the statistical theory of estimators and the practical challenges of their computational implementation.

artificial intelligence, machine learning, optimization problem, (16 more...)