NearOptimalExploration-Exploitationin Non-CommunicatingMarkovDecisionProcesses

Neural Information Processing Systems 

Reinforcement learning (RL) [1] studies the problem of learning in sequential decision-making problems where the dynamics of the environment is unknown, but can be learnt by performing actions andobserving their outcome inanonline fashion. Asample-efficient RLagent must trade off the explorationneeded to collect information about the environment, and theexploitation of the experience gathered so far to gain as much reward as possible.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found