Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks

Open in new window