Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms
–Neural Information Processing Systems
When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network.
Neural Information Processing Systems
Dec-24-2025, 18:10:45 GMT
- Technology: