To test out some harmless uses for AI, one Open AI team taught a bot to play Dota 2. Musk thanked the company via Twitter for allowing Open AI to use the Microsoft Azure crowd computing platform to develop the bot. Would like to express our appreciation to Microsoft for use of their Azure cloud computing platform. "Would like to express our appreciation to Microsoft for use of their Azure cloud computing platform," he wrote.
To allow for greater flexibility, I will then describe how to build a class of reinforcement learning agents, which can optimize for various goals called "direct future prediction" (DFP). Reinforcement learning involves agents interacting in some environment to maximize obtained rewards over time. Q-learning and other traditionally formulated reinforcement learning algorithms learn a single reward signal, and as such, can only pursue a single "goal" at a time. If we want our drone to learn to deliver packages, we simply provide a positive reward of 1 for successfully flying to a marked location and making a delivery.
The agent uses a method called "deep learning" to turn the basic visual input into meaningful concepts, mirroring the way the human brain takes raw sensory information and transforms it into a rich understanding of the world. The agent is programmed to work out what is meaningful through "reinforcement learning", the basic notion that scoring points is good and losing them is bad. In videos provided by Deep Mind, the agent is shown making random and largely unsuccessful movements at the start, but after 600 hundred rounds of training (two weeks of computer time) it has figured out what many of the games are about. Hassabis stops short of calling this a "creative step", but said it proves computers can "figure things out for themselves" in a way that is normally thought of as uniquely human.