Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks