"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them." – Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
However,extrinsic rewards may be insufficiently informative to encourage an agent to explore and understand its environment, particularly in partially observed settings where the agent has a limited view of its environment.
Deep learning algorithms have shown significant development thanks tothelargepre-collected dataset, such as SQuAD [47]innatural language processing (NLP), and ImageNet [4] in computer vision.