Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Open in new window