Asynchronous n-steps Q-learning

Feb-25-2017, 16:25:14 GMT–#artificialintelligence

Q-learning is the most famous Temporal Difference algorithm. Original Q-learning algorithm tries to determine the state-action value function that minimizes the error below. We will use an optimizer (the simplest one- Gradient Descent) to compute the values of the state-action function. First of all we need to compute the gradient of the loss function. Gradient descent finds the minimum of a function by subtracting the gradient, with respect to the parameters of the function, from the parameters.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Feb-25-2017, 16:25:14 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)