Supplements of " Non-crossing quantile regression in deep reinforcement learning "

Aug-22-2025, 00:42:54 GMT–Neural Information Processing Systems

We first introduce the following Lemma, which is used to complete the proof of Lemma 1. Lemma. Consider an MDP with countable state and action spaces. Therefore, the inequality (4) holds, which completes the proof.Now we give the proof of Lemma 1. Lemma 1. The proof is similar to the argument of that of Proposition 2 of [1]. We assume that instantaneous rewards given a state-action pair are deterministic, and the general case is a straight-forward generalization with the regular probability argument.

lemma 1, quantile, state-action pair, (12 more...)

Neural Information Processing Systems

Aug-22-2025, 00:42:54 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Duplicate Docs Excel Report

Title
b6f8dc086b2d60c5856e4ff517060392-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found