AlphaGo, in context – Andrej Karpathy – Medium

Jun-4-2017, 05:00:23 GMT–#artificialintelligence

AlphaGo is made up of a number of relatively standard techniques: behavior cloning (supervised learning on human demonstration data), reinforcement learning (REINFORCE), value functions, and Monte Carlo Tree Search (MCTS). In particular, AlphaGo uses a SL (supervised learning) policy to initialize the learning of an RL (reinforcement learning) policy that gets perfected with self-play, which they then estimate a value function from, which then plugs into MCTS that (somewhat surprisingly) uses the (worse!, but more diverse) SL policy to sample rollouts. That being said, AlphaGo does not by itself use any fundamental algorithmic breakthroughs in how we approach RL problems. While AlphaGo does not introduce fundamental breakthroughs in AI algorithmically, and while it is still an example of narrow AI, AlphaGo does symbolize Alphabet's AI power: in both the quantity/quality of the talent present in the company, the computational resources at their disposal, and the all in focus on AI from the very top.

AlphaGo, game of go, Information Technology Software, (15 more...)

#artificialintelligence

Jun-4-2017, 05:00:23 GMT

News Web Page

Add feedback

Industry:
- Information Technology > Software (1.00)
- Leisure & Entertainment > Games
  - Go (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Games > Go (1.00)

Similar Docs Excel Report more

Title	Similarity	Source
None found