A Proof of Theorem 1 Recall that under maximum entropy RL, the Q-function is defined as Q π ent, a

Neural Information Processing Systems 

We use uncorrected to denote prioritized sampling without IS corrections.