Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Oct-11-2025, 00:28:38 GMT–Neural Information Processing Systems

If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps.

lemma, stable phase, two-layer network, (14 more...)

Neural Information Processing Systems

Oct-11-2025, 00:28:38 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.65)

Duplicate Docs Excel Report

Title
835a0185f61867a1ea0f86155489839a-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found