"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them." – Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
Subsequently, LfV oid trains an ensembled goal discriminator on the generated image to provide reward signals for a reinforcement learning agent, guiding it to achieve the goal.
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs).
Incontrast toLearning fromDemonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g.
Weproveconvergence for this method under standard assumptions and demonstrate empirically that it indeed enables lowerdiscount factors forapproximate reinforcement-learning methods.
Object-oriented representations in reinforcement learning have shown promise in transfer learning, with previous research introducing a propositional objectoriented framework that has provably efficient learning bounds with respect to samplecomplexity.