Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks

Open in new window