"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them." – Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
Wefurther propose newtraining methods todisentangle the embeddings, making them both distinctive signatures of the environments and tasks and effective building blocks for composing the policies.
In the reinforcement learning context, anOption means a temporally extended sequence of actions [30],andisregarded asuseful formanypurposes, such asspeeding uplearning, transferring skills across domains, and solving long-term planning problems.