Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Open in new window