< 0.01, and 75th-percentiles of the total number of gradient descent steps used (across all networks G
–Neural Information Processing Systems
Shown are percentages of "successful" solutions ˆx We thank the reviewers for carefully reading our paper and providing insightful and constructive comments. We will update Table 1 of the original manuscript to display this new comparison. S(x, θ, τ) is just the set of neurons that are close to zero before ReLU thresholding. These are the neurons for which the signs could change after a small change of the network input x. This case is not covered by Theorem 3.1, because y is Please see our comment starting on line 157.
Neural Information Processing Systems
Jun-1-2025, 21:03:19 GMT
- Technology: