The environment is the setting that the agent is acting on and the agent represents the RL algorithm. To understand this better, let's suppose that our agent is learning to play counterstrike. The mathematical approach for mapping a solution in Reinforcement Learning is called Markov's Decision Process (MDP). To briefly sum it up, the agent must take an action (A) to transition from the start state to the end state (S). While doing so, the agent receives rewards (R) for each action he takes.
Jan-24-2022, 22:00:11 GMT