Dissecting Reinforcement Learning-Part.3

#artificialintelligence 

The update rule is based on the tuple State-Reward-State. Remember that now we are in the control case. Here we use the Q-function (see second post) to estimate the best policy. The Q-function requires as input a state-action pair.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found