Explained simply: How DeepMind taught AI to play video games
Then this paragraph is self-explanatory. Deep Learning methods don't work easily with reinforcement learning like they do in supervised/unsupervised learning. Most DL applications have involved huge training datasets with accurate samples and labels. Or in unsupervised learning, the target cost function is still quite quite convenient to work with. But in RL, there's a catch -- as you know, RL involves rewards which could be delayed many time steps into the future (for example it takes several moves to knock the opponent's queen in chess, and each of those moves doesn't return the same immediate reward as the final move, EVEN IF one of those moves might be more important than the final move). The rewards could also be noisy -- for instance, sometimes the points for a particular move are slightly random and not easily predictable!
Aug-27-2017, 19:55:16 GMT