Pinaki Laskar posted on LinkedIn
What are the potentials of deep reinforcement learning? The goal of a #reinforcementlearning agent, interacting with its environment in discrete time steps, is to learn a policy: A x S [0,1], which maximizes the expected cumulative reward R (or minimize a regret function measured as the value of difference between a made decision and the optimal decision). The policy map gives the probability Pr (a/s) of taking action a when in state s. RF learning, approximate dynamic #programming, or neuro-dynamic programming, is modeled as a Markov decision process (MDP). The whole idea is restricted by the standard Anthropomorphic #AI model, the AI system as optimizing a fixed objective, which must be replaced.
Oct-16-2020, 17:35:25 GMT
- Technology: