AITopics | condition number

Collaborating Authors

condition number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks

Neural Information Processing SystemsJun-22-2026, 19:29:36 GMT

Deployment of neural networks on resource-constrained devices demands models that are both compact and robust to adversarial inputs. However, compression and adversarial robustness often conflict. In this work, we introduce a dynamical lowrank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer. This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing accuracy on clean data. The method is model-and data-agnostic, computationally efficient, and supports rank adaptivity to automatically compress the network at hand. Extensive experiments across standard architectures, datasets, and adversarial attacks show the regularized networks can achieve over 94% compression while recovering or improving adversarial accuracy relative to uncompressed baselines.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.94)
Information Technology > Security & Privacy (0.85)
Government > Military (0.85)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Linearization Explains Fine-Tuning in Large Language Models

Neural Information Processing SystemsJun-22-2026, 13:16:19 GMT

Parameter-Efficient Fine-Tuning (PEFT) is a popular class of techniques that strive to adapt large models in a scalable and resource-efficient manner. Yet, the mechanisms underlying their training performance and generalization remain underexplored. In this paper, we provide several insights into such fine-tuning through the lens of linearization. Fine-tuned models are often implicitly encouraged to remain close to the pretrained model. By making this explicit, using an ℓ2distance inductive bias in parameter space, we show that fine-tuning dynamics become equivalent to learning with the positive-definite neural tangent kernel (NTK). We specifically analyze how close the fully linear and the linearized finetuning optimizations are, based on the strength of the regularization. This allows us to be pragmatic about how good a model linearization is when fine-tuning large language models (LLMs). When linearization is a good model, our findings reveal a strong correlation between the eigenvalue spectrum of the NTK and the performance of model adaptation. Motivated by this, we give spectral perturbation bounds on the NTK induced by the choice of layers selected for fine-tuning.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Better NTKConditioning: AFree Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks

Neural Information Processing SystemsJun-22-2026, 08:06:35 GMT

Nonlinear activation functions are widely recognized for enhancing the expressivity of neural networks, which is the primary reason for their widespread implementation. In this work, we focus on ReLU activation and reveal a novel and intriguing property of nonlinear activations. By comparing enabling and disabling the nonlinear activations in the neural network, we demonstrate their specific effects on wide neural networks: (a) better feature separation, i.e., a larger angle separation for similar data in the feature space of model gradient, and (b) better NTK conditioning, i.e., a smaller condition number of neural tangent kernel (NTK). Furthermore, we show that the network depth (i.e., with more nonlinear activation operations) further amplifies these effects; in addition, in the infinite-width-then-depth limit, all data are equally separated with a fixed angle in the model gradient feature space, regardless of how similar they are originally in the input space. Note that, without the nonlinear activation, i.e., in a linear neural network, the data separation remains the same as for the original inputs and NTK condition number is equivalent to the Gram matrix, regardless of the network depth. Due to the close connection between NTK condition number and convergence theories, our results imply that nonlinear activation helps to improve the worst-case convergence rates of gradient based methods.

artificial intelligence, machine learning, neural network, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification

Neural Information Processing SystemsJun-22-2026, 00:32:41 GMT

Imputation methods play a critical role in enhancing the quality of practical timeseries data, which often suffer from pervasive missing values. Recently, diffusionbased generative imputation methods have demonstrated remarkable success compared to autoregressive and conventional statistical approaches. Despite their empirical success, the theoretical understanding of how well diffusion-based models capture complex spatial and temporal dependencies between the missing values and observed ones remains limited.

data mining, fmult, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.93)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Fast exact recovery of noisy matrix from few entries: the infinity norm approach

Neural Information Processing SystemsJun-18-2026, 05:48:30 GMT

The matrix recovery (completion) problem, a central problem in data science, involves recovering a matrix Afrom a relatively small random set of entries. While such a task is generally impossible, it has been shown that one can recover A exactly in polynomial time, with high probability, under three basic and necessary assumptions: (1) the rank of A is very small compared to its dimensions (low rank), (2) A has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large. Various algorithms address this task, including convex optimization by Candes, Recht, and Tao (2009), alternating projection by Hardt and Wooters (2014), and low-rank approximation with gradient descent by Keshavan, Montanari, and Oh (2009, 2010). In applications, Candes and Plan (2009) noted that it is more realistic to assume noisy observations. In such cases, the above approaches provide approximate recovery with small root mean square error, which is difficult to convert into exact recovery.

artificial intelligence, assumption, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning

Neural Information Processing SystemsJun-17-2026, 13:52:50 GMT

Matrix preconditioning is a critical technique to accelerate the solution of linear systems, where performance heavily depends on the selection of preconditioning parameters. Traditional parameter selection approaches often define fixed constants for specific scenarios. However, they rely on domain expertise and fail to consider the instance-wise features for individual problems, limiting their performance. In contrast, machine learning (ML) approaches, though promising, are hindered by high inference costs and limited interpretability. To combine the strengths of both approaches, we propose a symbolic discovery framework-namely, Symbolic Matrix Preconditioning (SymMaP)-to learn efficient symbolic expressions for preconditioning parameters. Specifically, we employ a neural network to search the high-dimensional discrete space for expressions that can accurately predict the optimal parameters. The learned expression allows for high inference efficiency and excellent interpretability (expressed in concise symbolic formulas), making it simple and reliable for deployment. Experimental results show that SymMaP consistently outperforms traditional strategies across various benchmarks 1.

artificial intelligence, expression, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Finding Low-Rank Matrix Weights in DNNs via Riemannian Optimization: RAdaGrad and RAdamW

Neural Information Processing SystemsJun-17-2026, 10:09:16 GMT

Finding low-rank matrix weights is a key technique for addressing the high memory usage and computational demands of large models. Most existing algorithms rely on the factorization of the low-rank matrix weights, which is non-unique and redundant. Their convergence is slow especially when the target low-rank matrices are ill-conditioned, because the convergence rate depends on the condition number of the Jacobian operator for the factorization and the Hessian of the loss function with respect to the weight matrix. To address this challenge, we adopt the Riemannian gradient descent (RGD) algorithm on the Riemannian manifold of fixed-rank matrices to update the entire low-rank weight matrix. This algorithm completely avoids the factorization, thereby eliminating the negative impact of the Jacobian condition number.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

Linear Transformers Implicitly Discover Unified Numerical Algorithms

Neural Information Processing SystemsJun-16-2026, 02:22:20 GMT

A transformer is merely a stack of learned datatodata maps--yet those maps can hide rich algorithms. We train a linear, attention-only transformer on millions of masked-block completion tasks: each prompt is a masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice for Nyström extrapolation. The model sees only input--output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across all three resource regimes (full visibility, bandwidth-limited heads, rank-limited attention). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with compute-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nyström extrapolationhighlighting a powerful capability of in-context learning.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

Neural Information Processing SystemsJun-15-2026, 16:31:12 GMT

The conjugate gradient solver (CG) is a prevalent method for solving symmetric and positive definite linear systems Ax = b, where effective preconditioners are crucial for fast convergence. Traditional preconditioners rely on prescribed algorithms to offer rigorous theoretical guarantees, while limiting their ability to exploit optimization from data. Existing learning-based methods often utilize Graph Neural Networks (GNNs) to improve the performance and speed up the construction. However, their reliance on incomplete factorization leads to significant challenges: the associated triangular solve hinders GPU parallelization in practice, and introduces long-range dependencies which are difficult for GNNs to model. To address these issues, we propose a learning-based method to generate GPU-friendly preconditioners, particularly using GNNs to construct Sparse Approximate Inverse (SPAI) preconditioners, which avoids triangular solves and requires only two matrix-vector products at each CG step.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

On the Convergence of Stochastic Smoothed Multi-Level Compositional Gradient Descent Ascent

Neural Information Processing SystemsJun-14-2026, 07:01:35 GMT

Multi-level compositional optimization is a fundamental framework in machine learning with broad applications. While recent advances have addressed compositional minimization problems, the stochastic multi-level compositional minimax problem introduces significant new challenges--most notably, the biased nature of stochastic gradients for both the primal and dual variables. In this work, we address this gap by proposing a novel stochastic multi-level compositional gradient descent-ascent algorithm, incorporating a smoothing technique under the nonconvex-PL condition. We establish a convergence rate to an $(\epsilon, \epsilon/\sqrt{\kappa})$-stationary point with improved dependence on the condition number at $O(\kappa^{3/2})$, where $\epsilon$ denotes the solution accuracy and $\kappa$ represents the condition number. Moreover, we design a novel stage-wise algorithm with variance reduction to address the biased gradient issue under the two-sided PL condition. This algorithm successfully enables a translation from and $(\epsilon, \epsilon/\sqrt{\kappa})$-stationary point to an $\epsilon$-stationary point.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback