Goto

Collaborating Authors

 interpolation


A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots

arXiv.org Machine Learning

The population dynamics of molecules, cells, and organisms are governed by a number of unknown forces. In the last decade, population dynamics have predominantly been modeled with Wasserstein gradient flows. However, since gradient flows minimize free energy, they fail to capture important dynamical properties, such as periodicity. In this work, we propose a change in perspective by considering dynamics that minimize a population-level action under a damped Wasserstein Lagrangian. By deriving the corresponding Hamiltonian equations of motion, we formalize Wasserstein Lagrangian Mechanics, a structured class of second-order dynamics that encompasses classical mechanics, quantum mechanics, and gradient flows. We then propose WLM as the first algorithm that learns these second-order dynamics from observed marginals, without specifying the Lagrangian. By directly learning the population mechanics, WLM can both forecast and interpolate unseen marginals, and outperforms existing gradient flow and flow matching methods across a wide range of dynamics, including vortex dynamics, embryonic development, and flocking.



Null Measurability at the Symmetrization Interface in VC Learning

arXiv.org Machine Learning

Recent work revisiting measurability in the fundamental theorem of statistical learning imposes Borel measurability of ghost-gap suprema. We show that, at the one-sided ghost-gap interface actually used by the standard symmetrization proof, this requirement is stronger than necessary. For any Borel-parameterized concept class on a Polish domain, the bad event "there exists a hypothesis whose ghost empirical error exceeds its training empirical error by at least ε/2" is analytic. By Choquet capacitability, it is therefore measurable in the completion of every finite Borel measure. We then construct a concept class whose bad event is null-measurable but not Borel, giving a strict separation from the Borel supremum condition. Finally, we prove closure under patching, fixed and countable interpolation, and fiber-product amalgamation, showing that the weaker regularity level is stable under natural concept-class constructors. In the realizable setting, where targets belong to the class and are measurable, these results weaken the measurability hypothesis needed by the symmetrization route from finite VC dimension to PAC learnability. The main results and the descriptive-set-theoretic infrastructure used by them are formalized in Lean 4.


Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

arXiv.org Machine Learning

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the largedimensional setting where the sample size and dimension are of comparable order, i.e., n dγ for some γ > 0. We first consider inner-product kernels on the sphere Sd 1 and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions s 0, where smeasures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever sis positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in Rd whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions. Keywords: Spectral algorithms, learning curves, high dimension, benign overfitting. 1 Introduction Nonparametric regression studies the estimation of an unknown function f: Rd R from ni.i.d.


Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Neural Information Processing Systems

The proposed SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.





scaleKernelMatrix

Neural Information Processing Systems

Kernel matrix-vector multiplication (KMVM) is one of the most important operations needed in scientific computing with core applications indiffeomorphic registration, geometric learning [11], [31],numerical analysis [28],fluid dynamics [6],and machine learning [27].