Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Sep-19-2023–arXiv.org Machine Learning

In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$. We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of training input vectors belonging to the same output vector $y_j$, $j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}}$, $j=1,\dots,Q$. We comment on the characterization of the global minimum of the cost function in the given context.

cost function, shallow network, theorem 3, (16 more...)

arXiv.org Machine Learning

Sep-19-2023

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States
  - Texas > Travis County > Austin (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Basque Country
    - Biscay Province > Bilbao (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found