Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
–Neural Information Processing Systems
If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps.
Neural Information Processing Systems
Oct-11-2025, 00:28:38 GMT
- Country:
- North America > United States > Texas > Clay County (0.04)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: