A Study of Neural Training with Non-Gradient and Noise Assisted Gradient Methods
Mukherjee, Anirbit, Muthukumar, Ramchandran
Eventually this lead to an explosion of literature getting l inear time training of various kinds of neural nets when their width is a high degree polynomial in training set size, inverse accuracy and inverse confidence parameters (a somewhat unrealistic regime), [ 26 ], [ 39 ], [ 11 ], [ 37 ], [ 22 ], [ 17 ], [ 3 ], [ 2 ], [ 4 ], [ 10 ], [ 42 ], [ 43 ], [ 7 ], [ 8 ], [ 29 ], [ 6 ]. The essential essential proximity of this regime to kernel meth ods have been thought of separately in works like [ 1 ], [ 38 ] Even in the wake of this progress, it remains unclear as to how any of this can help establish rigorous guarantees about smaller neural networks or more pertinently for constant size neura l nets which is a regime closer to what is implemented in the real world.
Aug-27-2020