r/MachineLearning - [R] [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Nov-21-2019, 15:44:28 GMT–#artificialintelligence

Much of it is the same as Value Prediction Networks, which proposes that instead of training a model to minimize L2 prediction-loss, you just train it to get the long-term reward/value right for a start state and a series of actions. That gets around a lot of the difficulty of using MBRL for Atari-like things, where it's very hard to accurately predict next pixels. They pretty much simulate a dense tree to some short depth, assign estimated values to the nodes, and use that for action selection. One is that you're probably simulating a lot of states that your value-function would tell you are DEFINITELY not worthwhile. Atari has 16 actions -- it's unfeasible to simulate more than 3 states deep. And since you're simulating in all directions, but only taking the best (e-greedy) action, you're not going to gather training data on most of the transitions you're estimating.

chess and shogi, machinelearning, mastering atari, (3 more...)

#artificialintelligence

Nov-21-2019, 15:44:28 GMT

News Web Page

Add feedback

Industry:
- Media > News (0.40)
- Leisure & Entertainment > Games
  - Chess (0.40)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Communications > Social Media (0.76)