To test out some harmless uses for AI, one Open AI team taught a bot to play Dota 2. Musk thanked the company via Twitter for allowing Open AI to use the Microsoft Azure crowd computing platform to develop the bot. Would like to express our appreciation to Microsoft for use of their Azure cloud computing platform. "Would like to express our appreciation to Microsoft for use of their Azure cloud computing platform," he wrote.
The Elon Musk-backed OpenAI team has developed a machine learning system that has beaten "many" of the best pro Dota 2 players in one-on-one matches, including star player Dendi during a live demonstration at The International. The result is an AI that not only has the fundamentals nailed down, but understands the nuances that take human players a long time to master. And it doesn't take too long to learn, either; OpenAI's creation can beat regular Dota 2 bots after an hour of learning, and beat the best humans after just two weeks. One-on-one matches are far less complex than standard five-on-five matches, and it's notable that the machine learning system doesn't use the full range of tactics you see from human rivals.
A bot created by the Elon Musk-backed nonprofit OpenAI defeated champion human player Danylo "Dendi" Ishutin in two back to back demonstration matches. Vastly more complex than traditional board games like chess & Go. It's currently one of the most popular games from Valve, the publisher that organized last night's event, and one of the most popular competitive e-sports games worldwide. Perhaps most significantly, while Ishutin was defeated in a 1-on-1 match, DOTA 2 is normally played by opposing teams of five players each.
The result is an AI that not only has the fundamentals nailed down, but understands the nuances that take human players a long time to master. And it doesn't take too long to learn, either; OpenAI's creation can beat regular Dota 2 bots after an hour of learning, and beat the best humans after just two weeks. OpenAI hopes to have its bot mastering five-on-fives by next year's Invitational, though. What OpenAI has learned with Dota 2 might just translate to other fields where understanding subtleties can be crucial to success.
Elon Musk, noted artificial intelligence worrywart, backs the tech firm behind a robot brain that was smart enough to take down a Dota 2 pro this week. The goal is to eventually assemble a full team of AI bots for 5v5 matches and, further down the road, to mix AI players in with human players on a single team. As you weigh all of this, remember again: Elon Musk, AI fearmonger extraordinaire, is an OpenAI founder. Teaching a robot brain to play battle-oriented strategy games... what could go wrong?
To allow for greater flexibility, I will then describe how to build a class of reinforcement learning agents, which can optimize for various goals called "direct future prediction" (DFP). Reinforcement learning involves agents interacting in some environment to maximize obtained rewards over time. Q-learning and other traditionally formulated reinforcement learning algorithms learn a single reward signal, and as such, can only pursue a single "goal" at a time. If we want our drone to learn to deliver packages, we simply provide a positive reward of 1 for successfully flying to a marked location and making a delivery.
In the particular case of the Facebook negotiation chat bot, you give it examples of negotiation dialogs with the whole situation properly annotated -- what the initial state was, the preferences of the negotiator, what was said, what the result was, etc. The program analyzes all these examples, extracts some features of each dialog, and assigns a number to these features, representing how often dialogs with that feature ended in positive results for the negotiator. AlphaGo started learning from real games played by real people. The original training data set was in English, but the extracted features were just words and phrases, and the robot was just putting them together based on the numerical representation of how likely they were going to help get the desired outcome.
The agent uses a method called "deep learning" to turn the basic visual input into meaningful concepts, mirroring the way the human brain takes raw sensory information and transforms it into a rich understanding of the world. The agent is programmed to work out what is meaningful through "reinforcement learning", the basic notion that scoring points is good and losing them is bad. In videos provided by Deep Mind, the agent is shown making random and largely unsuccessful movements at the start, but after 600 hundred rounds of training (two weeks of computer time) it has figured out what many of the games are about. Hassabis stops short of calling this a "creative step", but said it proves computers can "figure things out for themselves" in a way that is normally thought of as uniquely human.