Goto

Collaborating Authors

 Reinforcement Learning





Reinforcement Learning with Feedback Graphs

Neural Information Processing Systems

We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples.





Supplements of " Non-crossing quantile regression in deep reinforcement learning "

Neural Information Processing Systems

We first introduce the following Lemma, which is used to complete the proof of Lemma 1. Lemma. Consider an MDP with countable state and action spaces. Therefore, the inequality (4) holds, which completes the proof.Now we give the proof of Lemma 1. Lemma 1. The proof is similar to the argument of that of Proposition 2 of [1]. We assume that instantaneous rewards given a state-action pair are deterministic, and the general case is a straight-forward generalization with the regular probability argument.


Non-crossing quantile regression for deep reinforcement learning

Neural Information Processing Systems

Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs.