Reinforcement Q-Learning from Scratch in Python with OpenAI Gym – LearnDataSci
Essentially, Q-learning lets the agent use the environment's rewards to learn, over time, the best action to take in a given state. In our Taxi environment, we have the reward table, P, that the agent will learn from. It does thing by looking receiving a reward for taking an action in the current state, then updating a Q-value to remember if that action was beneficial. The values store in the Q-table are called a Q-values, and they map to a (state, action) combination. A Q-value for a particular state-action combination is representative of the "quality" of an action taken from that state.
Jun-14-2018, 09:41:29 GMT
- Technology: