Supplements of " Non-crossing quantile regression in deep reinforcement learning "
–Neural Information Processing Systems
We first introduce the following Lemma, which is used to complete the proof of Lemma 1. Lemma. Consider an MDP with countable state and action spaces. Therefore, the inequality (4) holds, which completes the proof.Now we give the proof of Lemma 1. Lemma 1. The proof is similar to the argument of that of Proposition 2 of [1]. We assume that instantaneous rewards given a state-action pair are deterministic, and the general case is a straight-forward generalization with the regular probability argument.
Neural Information Processing Systems
Aug-22-2025, 00:42:54 GMT