to

### Collaborating Authors

Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the lingering'' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k 1},x_{k 2},\dots$ may be reduced. We show how this improves the running time of gradient descent and SVRG. For instance, if the "additional time'' scales linearly with respect to the traveled distance, then the "convergence rate'' of gradient descent can be improved from $1/T$ to $\exp(-T {1/3})$. On the empirical side, we solve a hypothetical revenue management problem on the Yahoo!

Now, what was the Gradient Descent algorithm? Above algorithm says, to perform the GD, we need to calculate the gradient of the cost function J. And to calculate the gradient of the cost function, we need to sum (yellow circle!) the cost of each sample. If we have 3 million samples, we have to loop through 3 million times or use the dot product. If you insist to use GD.

### Understanding the 3 Primary Types of Gradient Descent

Gradient descent is the most commonly used optimization method deployed in machine learning and deep learning algorithms. It's used to train a machine learning model and is based on a convex function. It does this to minimize a given cost function to its local minimum. Gradient descent was invented by French mathematician Louis Augustin Cauchy in 1847. Most machine learning and deep learning algorithms involve some sort of optimization.

Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (a) using the past experience to estimate {\em only} the gradient of the expected return $U(\theta)$ at the current policy parameterization $\theta$, rather than to obtain a more complete estimate of $U(\theta)$, and (b) using past experience under the current policy {\em only} rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines---a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds.