AITopics | regularity

Collaborating Authors

regularity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence of Continual Learning in Homogeneous Deep Networks

Schliserman, Matan, Buzaglo, Gon, Evron, Itay, Soudry, Daniel

arXiv.org Machine LearningJun-30-2026

We characterize weakly regularized continual classification in homogeneous models as sequential projections onto task margin sets. This result generalizes prior analyses restricted to either stationary (single-task) deep models or continual linear models. We show that global convergence generally fails, even for simple models linear in data but nonlinear in parameters. Nevertheless, by leveraging results from nonconvex projection theory, we identify regularity properties of homogeneous deep networks that guarantee local linear convergence under random and cyclic task sequences. Finally, we extend our analysis to continual regression, unifying the framework for homogeneous models.

artificial intelligence, continual learning, machine learning, (16 more...)

arXiv.org Machine Learning

2606.30559

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.40)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Minimax Adaptive Online Nonparametric Regression over Besov spaces

Neural Information Processing SystemsJun-23-2026, 04:34:30 GMT

This adaptive mechanism adjusts the resolution of the predictions over both time and space, yielding refined regret bounds in terms of local regularity. Consequently, in heterogeneous environments, our adaptive guarantees can significantly surpass those obtained by standard global methods.

artificial intelligence, coefficient, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)

Add feedback

Optimal score function estimation via derivatives constraints

Bonis, Thomas, Ngoc, Thanh Mai Pham, Tran, Viet Chi

arXiv.org Machine LearningJun-18-2026

We consider the problem of score function estimation via empirical risk minimization. We first start with the question of inferring the score function of a probability measure $μ$ with density on the flat torus from a sample of distribution $μ$. We show that constraining the hypothesis space to a Sobolev ball is sufficient to prevent overfitting and obtaining minimax estimation rates. We then consider the problem of score function estimation in the context of score-based generative modeling. Again, under a conjecture tying the score estimation rates to the quality of the output of a score-based generative model, we obtain minimax rates for such an approach using score function estimators obtained by constraining the hypothesis class to a Sobolev ball.

artificial intelligence, machine learning, score function, (17 more...)

arXiv.org Machine Learning

2606.19084

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Add feedback

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

Dölz, Jürgen, Multerer, Michael, Palma, Michele

arXiv.org Machine LearningMay-28-2026

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic framework based on the discrete modulus of continuity (DMOC), a non linear generalization of Lipschitz continuity that provides a finer notion of robustness. Unlike many existing approaches, DMOC does not require access to model internals and instead evaluates regularity relative to the data distribution. This shifts the focus from the model to the data, which provide a data-driven baseline of regularity against which the network's robustness is assessed. We establish convergence results for DMOC-induced seminorms with explicit data-driven rates in terms of the separation distance, and introduce a scalable minibatch algorithm that reduces the quadratic cost of exact computation, enabling application to large-scale data sets such as ImageNet. Empirically, DMOC serves as an architecture independent diagnostic: it distinguishes trained from untrained networks, reveals underfitting and overfitting regimes, and yields, as a special case, tight Lipschitz estimates comparable to state-of-the-art method such as ECLipsE and ECLipsE-fast.

artificial intelligence, dmoc, machine learning, (17 more...)

arXiv.org Machine Learning

2605.28729

Country:

North America > Canada (0.69)
Europe (0.68)
North America > United States > California (0.16)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

Lin, Likun, Wang, Zhongjian, Xin, Jack, Zhang, Zhiwen

arXiv.org Machine LearningMay-21-2026

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

machine learning, natural language, target measure, (20 more...)

arXiv.org Machine Learning

2605.21388

Country: North America > United States > Rhode Island (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Barrier-Metric First-Order Method for Linearly Constrained Bilevel Optimization

Hong, Tenglong, Grigas, Paul

arXiv.org Machine LearningMay-13-2026

We study bilevel optimization with a fixed polyhedral lower feasible set. Such problems are challenging for two reasons: active-set changes can make the upper objective nonsmooth, and existing hypergradient methods typically require lower-Hessian inversions or equivalent linear solves, which are computationally expensive. To address these issues, we adopt a logarithmic barrier smoothing of the lower problem to obtain a differentiable approximation of the constrained bilevel objective, and develop a proxy-gradient algorithm for the resulting barrier-smoothed surrogate. The algorithm uses only gradients of the upper and lower objectives; its only second-order object is the explicit logarithmic barrier Hessian determined by the fixed polyhedral constraints. Barrier smoothing restores differentiability, but Euclidean smoothness constants are not uniformly bounded near the boundary. We therefore develop a local Dikin-geometry analysis in which the barrier-metric provides an oracle-free curvature scale near the moving lower centers. This leads to barrier-aware schedules that keep the iterates inside locally well-behaved regions. For the barrier-smoothed objective, we prove stationarity rates of $\widetilde{O}(K^{-2/3})$ in the deterministic setting and $\widetilde{O}(K^{-2/5})$ under upper-level-only bounded stochastic noise after $K$ outer iterations, together with quantitative bias control as the barrier parameter decreases.

artificial intelligence, machine learning, neighborhood, (18 more...)

arXiv.org Machine Learning

2605.11476

Country: North America > United States > California (0.45)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

Chizat, Lénaïc, Colombo, Maria, Colombo, Roberto, Fernández-Real, Xavier

arXiv.org Machine LearningMay-12-2026

Stein Variational Gradient Descent (SVGD), introduced in [LW16], is a deterministic interactingparticle method for sampling from a target probability measure π e V, only requiring access to V. In the mean-field and continuous-time limit, the distribution of particles converges to a flow (ρt) in the space of probability measures that solves a variant of the Fokker-Planck equation with a velocity field smoothed by weighted convolution with a positive definite kernel [LLN19]. This flow can be interpreted as the gradient flow of the relative entropy H( |π) with respect to a "kernelized" Wasserstein metric [Liu17]. The goal of this paper is to investigate the convergence of (ρt) towards π. To this end, we focus on the model case of Riesz kernels of order s on the d-dimensional torus Td. This is a family of translation-invariant kernels whose Fourier coefficients decay as |ξ| 2s. The parameter s hence directly controls the "smoothing strength" of the interaction; in particular, continuous kernels correspond to s > d/2, C1 kernels to s > (d+1)/2, and C2 kernels to s > (d+2)/2. What is known: qualitative weak convergence The starting point of convergence analyses is the energy dissipation formula [Liu17] d dt H(ρt|π) = Is(ρt|π), (1.1) Authors are listed in alphabetical order.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Machine Learning

2605.09456

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Mean Testing under Truncation beyond Gaussian

Wang, Yuhao, Oliveira, Roberto Imbuzeiro, Gouleakis, Themis

arXiv.org Machine LearningMay-5-2026

We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an $\varepsilon$-fraction of the probability mass. For distributions with $p$-th directional moments of magnitude at most $ν_{P,p}$, truncation induces a bias of order $O(ν_{P,p}\varepsilon^{1-1/p})$. This bias creates a sharp information-theoretic detectability floor: when the signal $α$ falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, we prove that a simple second-order test achieving near-optimal sample complexity $n = O\!\left(\frac{\|Σ_P\|}{(α-4ν_{P,p}\varepsilon^{1-1/p})^2}\sqrt{d}\right)$. We further identify a structural escape from this finite-moment bias barrier. Under a directional median regularity assumption, truncation bias improves to linear order $O(\varepsilon)$. This reveals an intermediate regime in which estimation requires $Θ(d)$ samples for uniform recovery, while testing recovers the classical $Θ(\sqrt d)$ rate once truncation bias is eliminated. Together, our results provide a unified framework for mean testing under truncation, connecting finite-moment, sub-Gaussian, and median-regular structural regimes.

artificial intelligence, machine learning, truncation, (17 more...)

arXiv.org Machine Learning

2605.01335

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2172fde49301047270b2897085e4319d-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 02:06:24 GMT

artificial intelligence, coordination, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities

Stéphanovitch, Arthur

arXiv.org Machine LearningApr-8-2026

Under general assumptions on the target distribution $p^\star$, we establish a sharp Lipschitz regularity theory for flow-matching vector fields and diffusion-model scores, with optimal dependence on time and dimension. As applications, we obtain Wasserstein discretization bounds for Euler-type samplers in dimension $d$: with $N$ discretization steps, the error achieves the optimal rate $\sqrt{d}/N$ up to logarithmic factors. Moreover, the constants do not deteriorate exponentially with the spatial extent of $p^\star$. We also show that the one-sided Lipschitz control yields a globally Lipschitz transport map from the standard Gaussian to $p^\star$, which implies Poincaré and log-Sobolev inequalities for a broad class of probability measures.

artificial intelligence, machine learning, regularity, (18 more...)

arXiv.org Machine Learning

2604.06065

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > United Kingdom (0.04)
Europe > France (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback