Goto

Collaborating Authors

 Gradient Descent


79ec2a4246feb2126ecf43c4a4418002-Paper.pdf

Neural Information Processing Systems

Weformulate the decoding process asanoptimization problem which allows for multiple attributesweaimtocontrol tobeeasilyincorporated asdifferentiable constraints to the optimization. By relaxing this discrete optimization to a continuous one, we make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text.


EscapingSaddle-PointFasterunder Interpolation-likeConditions

Neural Information Processing Systems

One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an overparametrization setting, thefirst-order oracle complexityofPerturbed Stochastic Gradient Descent (PSGD) algorithm toreach an -local-minimizer,matches the corresponding deterministic rateof O(1/2).