sup 0
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- Research Report > Experimental Study (0.92)
- Workflow (0.67)
equaltoz = z 1tonormalize andhea Student ' - t-distribp(z) = 8
Let w =( 1.5,0,..0) N(0,0.5) Denoting (25) utionofonwsameasthatof (26) eyobservationisthat.., Z1/2w k are Toseewhythisisthecase, wecanvectorizeeachterm: First, let' Lemma ForanyF :Rd R!R+, define problem 1,..., k, as : = su Next, let' 2021) Provingthe 31 Lf (w, b) C(w)2 n (49) tobetheleft(47)(wherethe ( (w),b)isused depends wonlythrough (w)).
A Organization of the Appendices
In the Appendix, we give proofs of all results from the main text. We say a function f: R Y! R is M -Lipschitz if for any y 2Y and ˆ y We can also define the Moreau envelope of a function f: R Y! R by The proof of all results in this section can be straightforwardly extended to these settings. Boyd et al. 2004; Bauschke, Combettes, et al. 2011; Rockafellar 1970), but is also useful and Interestingly, there is a similar equivalent characterization for Lipschitz functions as well. Finally, we show that any smooth loss is square-root-Lipschitz. Lipschitz losses is more general than the class of smooth losses studied in Srebro et al. 2010 .
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York (0.04)
- North America > Canada > Quebec (0.04)
Pathway to $O(\sqrt{d})$ Complexity bound under Wasserstein metric of flow-based models
Meng, Xiangjun, Wang, Zhongjian
We provide attainable analytical tools to estimate the error of flow-based generative models under the Wasserstein metric and to establish the optimal sampling iteration complexity bound with respect to dimension as $O(\sqrt{d})$. We show this error can be explicitly controlled by two parts: the Lipschitzness of the push-forward maps of the backward flow which scales independently of the dimension; and a local discretization error scales $O(\sqrt{d})$ in terms of dimension. The former one is related to the existence of Lipschitz changes of variables induced by the (heat) flow. The latter one consists of the regularity of the score function in both spatial and temporal directions. These assumptions are valid in the flow-based generative model associated with the Föllmer process and $1$-rectified flow under the Gaussian tail assumption. As a consequence, we show that the sampling iteration complexity grows linearly with the square root of the trace of the covariance operator, which is related to the invariant distribution of the forward process.
- Europe > United Kingdom > North Sea > Southern North Sea (0.05)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
Deep Neural Operator Learning for Probabilistic Models
Bayraktar, Erhan, Feng, Qi, Zhang, Zecheng, Zhang, Zhaoyu
We propose a deep neural-operator framework for a general class of probability models. Under global Lipschitz conditions on the operator over the entire Euclidean space-and for a broad class of probabilistic models-we establish a universal approximation theorem with explicit network-size bounds for the proposed architecture. The underlying stochastic processes are required only to satisfy integrability and general tail-probability conditions. We verify these assumptions for both European and American option-pricing problems within the forward-backward SDE (FBSDE) framework, which in turn covers a broad class of operators arising from parabolic PDEs, with or without free boundaries. Finally, we present a numerical example for a basket of American options, demonstrating that the learned model produces optimal stopping boundaries for new strike prices without retraining.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Florida > Leon County > Tallahassee (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Non-asymptotic error bounds for probability flow ODEs under weak log-concavity
Kremling, Gitte, Iafrate, Francesco, Taheri, Mahsa, Lederer, Johannes
Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target distribution -- such as strong log-concavity or bounded support. This work establishes non-asymptotic convergence bounds in the 2-Wasserstein distance for a general class of probability flow ODEs under considerably weaker assumptions: weak log-concavity and Lipschitz continuity of the score function. Our framework accommodates non-log-concave distributions, such as Gaussian mixtures, and explicitly accounts for initialization errors, score approximation errors, and effects of discretization via an exponential integrator scheme. Bridging a key theoretical challenge in diffusion-based generative modeling, our results extend convergence theory to more realistic data distributions and practical ODE solvers. We provide concrete guarantees for the efficiency and correctness of the sampling algorithm, complementing the empirical success of diffusion models with rigorous theory. Moreover, from a practical perspective, our explicit rates might be helpful in choosing hyperparameters, such as the step size in the discretization.
Exact Dynamics of Multi-class Stochastic Gradient Descent
Collins-Woodfin, Elizabeth, Seroussi, Inbar
We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of functions of the limiting dynamics, including the risk and the overlap with the true signal, in terms of a deterministic solution to a system of ODEs. We extend the existing theory of high-dimensional SGD dynamics to Gaussian-mixture data and a large (growing with the parameter size) number of classes. We then investigate in detail the effect of the anisotropic structure of the covariance of the data in the problems of binary logistic regression and least square loss. We study three cases: isotropic covariances, data covariance matrices with a large fraction of zero eigenvalues (denoted as the zero-one model), and covariance matrices with spectra following a power-law distribution. We show that there exists a structural phase transition. In particular, we demonstrate that, for the zero-one model and the power-law model with sufficiently large power, SGD tends to align more closely with values of the class mean that are projected onto the "clean directions" (i.e., directions of smaller variance). This is supported by both numerical simulations and analytical studies, which show the exact asymptotic behavior of the loss in the high-dimensional limit.
- North America > United States > Oregon > Lane County > Eugene (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (2 more...)
- Research Report > New Finding (0.86)
- Research Report > Experimental Study (0.54)
Universality in Transfer Learning for Linear Models
We study the problem of transfer learning and fine-tuning in linear models for both regression and binary classification. In particular, we consider the use of stochastic gradient descent (SGD) on a linear model initialized with pretrained weights and using a small training data set from the target distribution.
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)