Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning
Chen, Thomas, Ewald, Patricia Muñoz
We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.
Nov-12-2023
- Country:
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States
- Texas > Travis County > Austin (0.15)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Basque Country
- Biscay Province > Bilbao (0.04)
- United Kingdom > England
- South America > Brazil
- Genre:
- Research Report (0.51)
- Technology: