Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.
Remaining a gradient-based method, it is also the first Bayesian model-agnostic meta-learning method applicable to various tasks including reinforcement learning.
Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's
Potential-based reward shaping incorporates prior domain knowledge in the form of additional rewards provided during training to speed up convergence of reinforcement learning algorithms, without changing the optimal policies (Ng et al. [1999]).