Asynchronous n-steps Q-learning
Q-learning is the most famous Temporal Difference algorithm. Original Q-learning algorithm tries to determine the state-action value function that minimizes the error below. We will use an optimizer (the simplest one- Gradient Descent) to compute the values of the state-action function. First of all we need to compute the gradient of the loss function. Gradient descent finds the minimum of a function by subtracting the gradient, with respect to the parameters of the function, from the parameters.
Feb-25-2017, 16:25:14 GMT
- Technology: