"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
A startup called CogitAI has developed a platform that lets companies use reinforcement learning, the technique that gave AlphaGo mastery of the board game Go. Gaining experience: AlphaGo, an AI program developed by DeepMind, taught itself to play Go by practicing. It's practically impossible for a programmer to manually code in the best strategies for winning. Instead, reinforcement learning let the program figure out how to defeat the world's best human players on its own. Drug delivery: Reinforcement learning is still an experimental technology, but it is gaining a foothold in industry.
What will be the next thing to revolutionize data science in 2019? Reinforcement learning will be the next big thing in data science in 2019. While RL has been around for a long time in academia, it has hardly seen any industry adoption at all. Why? Partly because there have been plenty of low-hanging fruits to pick in predictive analytics, but mostly because of the barriers in implementation, knowledge and available tools. The potential value in using RL in proactive analytics and AI is enormous, but it also demands a greater skillset to master.
However even these reinforcement learning algorithms couldn't transfer what they'd learned about one task to acquiring a new task. In order to realize this achievement, DeepMind supercharged a reinforcement learning algorithm called A3C. In so-called actor-critic reinforcement learning, of which A3C is one variety, acting and learning are decoupled so that one neural network, the critic, evaluates the other, the actor. Together, they drive the learning process. This was already the state of the art, but DeepMind added a new off-policy correction algorithm called V-trace to the mix, which made the learning more efficient, and crucially, better able to achieve positive transfer between tasks.
We are building technology that enables existing robot hardware to handle a much wider range of tasks where existing solutions break down, for example, bin picking of complex shapes, kitting, assembly, depalletizing of irregular stacks, and manipulation of deformable objects such as wires, cables, fabrics, linens, fluid-bags, and food. To equip existing robots with these skills, our software builds on the latest advances in deep reinforcement learning, deep imitation learning, and few-shot learning, to all of which the founding team has made significant contributions.
Canada has produced several big breakthroughs in artificial intelligence in recent years, and its government is keen to establish the country as a global epicenter of AI. The country's prime minister, Justin Trudeau, also hopes that the technology will learn Canadian values as it grows up. Speaking at a major AI event in Toronto today, Trudeau demonstrated an impressive enthusiasm for AI and machine learning, at one point even taking a stab at describing the concept of deep reinforcement learning, an approach that lets computers learn to do complex things that can't be programmed manually (see "10 Breakthrough Technologies 2017: Reinforcement Learning"). Both deep reinforcement learning and deep neural networks, which the method exploits, were pioneered by researchers working at Canadian universities. The country's government is now investing in big efforts to spur more AI research.