< 0.01, and 75th-percentiles of the total number of gradient descent steps used (across all networks G

Jun-1-2025, 21:03:19 GMT–Neural Information Processing Systems

Shown are percentages of "successful" solutions ˆx We thank the reviewers for carefully reading our paper and providing insightful and constructive comments. We will update Table 1 of the original manuscript to display this new comparison. S(x, θ, τ) is just the set of neurons that are close to zero before ReLU thresholding. These are the neurons for which the signs could change after a small change of the network input x. This case is not covered by Theorem 3.1, because y is Please see our comment starting on line 157.

artificial intelligence, machine learning, total number, (14 more...)

Neural Information Processing Systems

Jun-1-2025, 21:03:19 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.33)
  - Statistical Learning > Gradient Descent (0.43)