A Proof of Theorem 1 Recall that under maximum entropy RL, the Q-function is defined as Q π ent, a