where ℓ = 1,2,,L is the number of hidden layers (ψ(1)(ri) = ψ(ri) and L is the final layer), ReLU is the nonlinear activation function, W (ℓ) E RN N is the weight matrix in layer ℓ,and b

Open in new window