Directional convergence and alignment in deep learning
–Neural Information Processing Systems
The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the
Neural Information Processing Systems
Nov-15-2025, 07:33:32 GMT
- Country:
- Asia > Japan (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Illinois
- Champaign County > Urbana (0.04)
- Canada > Ontario
- Technology: