"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
While much work in data science to date has focused on algorithmic scale and sophistication, safety -- that is, safeguards against harm -- is a domain no less worth pursuing. This is particularly true in applications like self-driving vehicles, where a machine learning system's poor judgement might contribute to an accident. That's why firms like Intel's Mobileye and Nvidia have proposed frameworks to guarantee safe and logical decision-making, and it's why OpenAI -- the San Francisco-based research firm cofounded by CTO Greg Brockman, chief scientist Ilya Sutskever, and others -- today released Safety Gym. OpenAI describes it as a suite of tools for developing AI that respects safety constraints while training, and for comparing the "safety" of algorithms and the extent to which those algorithms avoid mistakes while learning. Safety Gym is designed for reinforcement learning agents, or AI that's progressively spurred toward goals via rewards (or punishments).
In January, artificial intelligence(AI) powerhouse DeepMind announced it had achieved a major milestone in its journey towards building AI systems that resemble human cognition. AlphaStar was a DeepMind agent designed using reinforcement learning that was able to beat two professional players at a game of StarCraft II, one of the most complex real-time strategy games of all time. During the last few months, DeepMind continued evolving AlphaStar to the point that the AI agent is now able to play a full game of StarCraft II at a Grandmaster level outranking 99.8% of human players. The results were recently published in Nature and they show some of the most advanced self-learning techniques used in modern AI systems. DeepMind's milestone is better explained by illustrating the trajectory from the first version of AlphaStar to the current one as well as some of the key challenges of StarCraft II.
Two branches of AI - Deep Learning, and Reinforcement Learning are now responsible for many real-world applications. Machine Translation, Speech Recognition, Object Detection, Robot Control, and Drug Discovery - are some of the numerous examples. Both approaches are data hungry - DL requires many examples of each class, and RL needs to play through many episodes to learn a policy. A small child can typically see an image just once, and instantly recognize it in other contexts and environments. We seem to possess an innate model/representation of how the world works, which helps us grasp new concepts and adapt to new situations fast.
Students at the University of Alberta are getting hands-on experience with artificial intelligence with a new robotic arm. Donated to the university's department of computing science by Kindred AI, a Canadian-based artificial intelligence company, the use of the robotic arm in the classroom helps students get a sense of reinforcement learning. Reinforcement learning is a branch of artificial intelligence, says Rapum Mahmood, assistant professor at the U of A and former Kindred AI research lead. "In reinforcement learning, we study by letting the agent interact with the environment, so that it can take the right set of actions," said Mahmood. Usually, the study is done through computer simulations and board games but in real-world applications, a robotic arm is used.
One of my favorite things about deep reinforcement learning is that, unlike supervised learning, it really, really doesn't want to work. Throwing a neural net at a computer vision problem might get you 80% of the way there. Throwing a neural net at an RL problem will probably blow something up in front of your face -- and it will blow up in a different way each time you try. A lot of the biggest challenges in RL revolve around two questions: how we interact with the environment effectively (e.g. In this post, I want to explore a few recent directions in deep RL research that attempt to address these challenges, and do so with particularly elegant parallels to human cognition.
Massive IoT including the large number of resource-constrained IoT devices has gained great attention. IoT devices generate enormous traffic, which causes network congestion. To manage network congestion, multi-channel-based algorithms are proposed. However, most of the existing multi-channel algorithms require strict synchronization, an extra overhead for negotiating channel assignment, which poses significant challenges to resource-constrained IoT devices. In this paper, a distributed channel selection algorithm utilizing the tug-of-war (TOW) dynamics is proposed for improving successful frame delivery of the whole network by letting IoT devices always select suitable channels for communication adaptively.
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers.