In September 2018, I got the opportunity to attend the Deep Learning Indaba conference that was held in Stellenbosch University, South Africa. Deep Learning Indaba was formed with an aim to strengthen African Machine Learning as well as to increase African participation and contribution to the advances in artificial intelligence and machine learning, and address issues of diversity in these fields of science. One of the lectures that I really enjoyed was on Success Stories of Reinforcement Learning where we got introduced to reinforcement learning as well as how it was used to build some pretty awesome computer programs. This lecture was presented by David Silver. Professor David Silver Leads the reinforcement learning research group at DeepMind which is an AI company based in London that was acquired by Google in 2014.
It is useful, for the forthcoming discussion, to have a better understanding of some key terms used in RL. Agent: A software/hardware mechanism which takes certain action depending on its interaction with the surrounding environment; for example, a drone making a delivery, or Super Mario navigating a video game. The algorithm is the agent. Action: An action is one of all the possible moves the agent can make. An action is almost self-explanatory, but it should be noted that agents usually choose from a list of discrete possible actions.
Check out the Github repo for an implementation of TD-Gammon with TensorFlow. A few weeks ago AlphaGo won a historic tournament playing the game of Go against Lee Sedol, one of the top Go players in the world. Many people have compared AlphaGo to DeepBlue, which won a series of famous chess matches against Gary Kasparov, but a different comparison may be made for the game of backgammon. Before DeepMind tackled playing Atari games or built AlphaGo there was TD-Gammon, the first algorithm to reach an expert level of play in backgammon. Gerald Tesauro published his paper in 1992 describing TD-Gammon as a neural network trained with reinforcement learning.
A new artificial intelligence startup called Osaro aims to give industrial robots the same turbocharge that DeepMind Technologies gave Atari-playing computer programs. In December 2013, DeepMind showcased a type of artificial intelligence that had mastered seven Atari 2600 games from scratch in a matter of hours, and could outperform some of the best human players. Google swiftly snapped up the London-based company, and the deep-reinforcement learning technology behind it, for a reported $400 million. Now Osaro, with $3.3 million in investments from the likes of Peter Thiel and Jerry Yang, claims to have taken deep-reinforcement learning to the next level, delivering the same superhuman AI performance but over 100 times as fast. Deep-reinforcement learning arose from deep learning, a method of using multiple layers of neural networks to efficiently process and organize mountains of raw data (see "10 Breakthrough Technologies 2013: Deep Learning").
The idea of implementing reinforcement learning in a computer was one of the earliest ideas about the possibility of AI, but reinforcement learning remained on the margin of AI until relatively recently. Today we see reinforcement learning playing essential roles in some of the most impressive AI applications. This article presents observations from the author’s personal experience with reinforcement learning over the most recent 40 years of its history in AI, focusing on striking connections that emerged between largely separate disciplines and on some of the findings that surprised him along the way. These connections and surprises place reinforcement learning in a historical context, and they help explain the success it is finding in modern AI. The article concludes by discussing some of the challenges that need to be faced as reinforcement learning moves out into real world.