Competitive Self-Play

#artificialintelligence 

We set up competitions between multiple simulated 3D robots on a range of basic games, trained each agent with simple goals (push the opponent out of the sumo ring, reach the other side of the ring while preventing the other agent from doing the same, kick the ball into the net or prevent the other agent from doing so, and so on), then analyzed the different strategies that emerged. Agents initially receive dense rewards for behaviours that aid exploration like standing and moving forward, which are eventually annealed to zero in favor of being rewarded for just winning and losing. Despite the simple rewards, the agents learn subtle behaviors like tackling, ducking, faking, kicking and catching, and diving for the ball. Each agent's neural network policy is independently trained with Proximal Policy Optimization. To understand how complex behaviors can emerge through a combination of simple goals and competitive pressure, let's analyze the sumo wrestling task.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found