AlphaGo, in context – Andrej Karpathy – Medium
AlphaGo is made up of a number of relatively standard techniques: behavior cloning (supervised learning on human demonstration data), reinforcement learning (REINFORCE), value functions, and Monte Carlo Tree Search (MCTS). In particular, AlphaGo uses a SL (supervised learning) policy to initialize the learning of an RL (reinforcement learning) policy that gets perfected with self-play, which they then estimate a value function from, which then plugs into MCTS that (somewhat surprisingly) uses the (worse!, but more diverse) SL policy to sample rollouts. That being said, AlphaGo does not by itself use any fundamental algorithmic breakthroughs in how we approach RL problems. While AlphaGo does not introduce fundamental breakthroughs in AI algorithmically, and while it is still an example of narrow AI, AlphaGo does symbolize Alphabet's AI power: in both the quantity/quality of the talent present in the company, the computational resources at their disposal, and the all in focus on AI from the very top.
Jun-4-2017, 05:00:23 GMT
- Industry:
- Information Technology > Software (1.00)
- Leisure & Entertainment > Games
- Go (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Games > Go (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence