Safety is one of the emerging concerns in deep learning systems. In the context of deep learning systems, safety is related to building agents that respect safety dynamics in a given environment. In many cases such as supervised learning, safety is modeled as part of the training datasets. However, other methods such as reinforcement learning require agents to master the dynamics of the environments by experimenting with it which introduces its own set of safety concerns. To address some of these challenges, OpenAI has recently open sourced Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.
One of the most important innovations in the present era for the development of highly-advanced AI systems has been the introduction of Reinforcement Learning (RL). It has the potential to solve complex decision-making problems. It generally follows a "trial and error" method to learn optimal policies of a given problem. It has been used to achieve superhuman performance in competitive strategy games, including Go, Starcraft, Dota, among others. Despite the promise shown by reinforcement algorithms in many decision-making problems, there are few glitches and challenges, which still need to be addressed.
Where the training of machine learning models is concerned, there's always a risk of overfitting -- or corresponding to closely -- to a particular set of data. In point of fact, it's not infeasible that popular machine learning benchmarks like the Arcade Learning Environment encourage overfitting, in that they have a low emphasis on generalization. That's why OpenAI -- the San Francisco-based research firm cofounded by CTO Greg Brockman, chief scientist Ilya Sutskever, and others -- today released the Procgen Benchmark, a set of 16 procedurally-generated environments (CoinRun, StarPilot, CaveFlyer, Dodgeball, FruitBot, Chaser, Miner, Jumper, Leaper, Maze, BigFish, Heist, Climber, Plunder, Ninja, and BossFight) that measure how quickly a model learns generalizable skills. It builds atop the startup's CoinRun toolset, which used procedural generation to construct sets of training and test levels. "We want the best of both worlds: a benchmark comprised of many diverse environments, each of which fundamentally requires generalization," wrote OpenAI in a blog post.
Roundup If you can't get enough of machine learning news then here's a roundup of extra tidbits to keep your addiction ticking away. Read on to learn more about how DeepMind is helping Google's Play Store, and a new virtual environment to train agents safely from OpenAI. An AI recommendation system for the Google Play Store: Deepmind are helping Android users find new apps in the Google Play Store with the help of machine learning. "We started collaborating with the Play store to help develop and improve systems that determine the relevance of an app with respect to the user," the London-based lab said this week. Engineers built a model known as a candidate generator.
Deep RL is where deep learning is used in conjunction with RL to simplify the reward function in cases where the search space is very large, or the environment is very complicated with multi-dimensional states, actions, and rewards. The use of deep learning with RL is also known as Q-learning in which a deep learning network is used as a function approximator (called the Q function), predicting the reward for an input, rather than trying to explore and store rewards and actions for every state. Also, in simulation environments, by simply feeding pixels of an environment through a neural network, it allows the reinforcement algorithm to better understand its environment. For the most part, RL is being used to teach AI systems how to play games, as games provide a safe and bounded environment for learning. For example, AlphaGo uses RL (in combination with other techniques) and similar techniques to have AI learn Atari games, or become champions at Poker.