Goto

Collaborating Authors

 nullw 1





Quantitative Propagation of Chaos for SGD in Wide Neural Networks S

Neural Information Processing Systems

Mean field approximation and propagation of chaos for mSGLD . . . . . . . . . . 4 S3 T echnical results 4 S4 Quantitative propagation of chaos 8 S4.1 Existence of strong solutions to the particle SDE . . . . . . . . . . . . . . . . . . If F = R, then we simply note C( E). S2.1 Presentation of the modified SGLD and its continuous counterpart The proof is postponed to Section S4.4 Consider now the mean-field SDE starting from a random variable W The proof is postponed to Section S4.4 Then, there exists L 0 such that the following hold. In what follows, we bound separately the two terms in the right-hand side.


A Additional related work

Neural Information Processing Systems

Soudry et al. [2018] showed that gradient descent on linearly-separable binary classification problems This analysis was extended to other loss functions, tighter convergence rates, non-separable data, and variants of gradient-based optimization algorithms [Nacson et al., 2019, As detailed in Section 2, Lyu and Li [2019] and Ji and Telgarsky [2020] showed that GF on homogeneous neural networks with exponential-type losses converge in direction to a KKT point of the maximum-margin problem in parameter space. The implications of margin maximization in parameter space on the implicit bias in predictor space for linear neural networks were studied in Gunasekar et al. [2018b] (as detailed in Section 2) and also in Jagadeesan et al. [2021], Ergen and Pilanci [2021a,b]. Moreover, several recent works considered implications of convergence to a KKT point of the maximum-margin problem, without assuming that the KKT point is optimal: Safran et al. [2022] proved a generalization bound in univariate depth-2 ReLU networks, V ardi et al. [2022] proved bias towards non-robust solutions in depth-2 The implicit bias in predictor space of diagonal and convolutional linear networks was studied in Gunasekar et al. [2018b], Moroshko Lyu et al. [2021] studied the implicit bias in two-layer leaky-ReLU networks trained on linearly They also gave constructions where a KKT point is not a global max-margin solution. We note that their constructions do not imply any of our results. Finally, the implicit bias of neural networks in regression tasks w.r.t. the square loss was also This setting, however, is less relevant to our work.



Noisy PDE Training Requires Bigger PINNs

Andre-Sloan, Sebastien, Mukherjee, Anirbit, Colbrook, Matthew

arXiv.org Artificial Intelligence

Physics-Informed Neural Networks (PINNs) are increasingly used to approximate solutions of partial differential equations (PDEs), especially in high dimensions. In real-world applications, data samples are noisy, so it is important to know when a predictor can still achieve low empirical risk. However, little is known about the conditions under which a PINN can do so effectively. We prove a lower bound on the size of neural networks required for the supervised PINN empirical risk to fall below the variance of noisy supervision labels. Specifically, if a predictor achieves an empirical risk $O(η)$ below $σ^2$ (variance of supervision data), then necessarily $d_N\log d_N\gtrsim N_s η^2$, where $N_s$ is the number of samples and $d_N$ is the number of trainable parameters of the PINN. A similar constraint applies to the fully unsupervised PINN setting when boundary labels are sampled noisily. Consequently, increasing the number of noisy supervision labels alone does not provide a ``free lunch'' in reducing empirical risk. We also show empirically that PINNs can indeed achieve empirical risks below $σ^2$ under such conditions. As a case study, we investigate PINNs applied to the Hamilton--Jacobi--Bellman (HJB) PDE. Our findings lay the groundwork for quantitatively understanding the parameter requirements for training PINNs in the presence of noise.


A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

Nguyen, Phan-Minh, Pham, Huy Tuan

arXiv.org Machine Learning

We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's width increases, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the \textit{mean field} limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a \textit{neuronal embedding}, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths. We demonstrate two applications of our framework. Firstly the framework gives a principled way to study the simplifying effects that independent and identically distributed initializations have on the mean field limit. Secondly we prove a global convergence guarantee for two-layer and three-layer networks. Unlike previous works that rely on convexity, our result requires a certain universal approximation property, which is a distinctive feature of infinite-width neural networks. To the best of our knowledge, this is the first time global convergence is established for neural networks of more than two layers in the mean field regime.