Understanding Reinforcement Learning Hands-On: The Bellman Equation pt.1
Welcome to the fifth entry on a series on Reinforcement Learning. In the previous article, we presented the MDP Framework for describing complex environments. This allowed us to create a more robust and diverse scenario for the basic Multi-Armed Bandits problem, which we called the Casinos Environment. We then implemented this scenario using OpenAI's gym, and made a simple agent that acted randomly to showcase how an interaction is realized under the MDP Framework. Today, we're going to focus back on the agents, and show a way in which we can describe an agent's behavior in complex scenarios, where past actions determine future rewards.
Oct-25-2020, 19:10:05 GMT
- Technology: