Earlier this year a group of world experts convened to discuss Doomsday scenarios and ways to counter them. The problem though was that they found discussing the threats humanity faces easy, but as for solutions, well, in the majority of cases they were stumped. This week DeepMind, Google's world famous Artificial Intelligence (AI) arm, in a world first, announced they have an answer to the potential AI apocalypse predicted by the group and leading luminaries ranging from Elon Musk to Stephen Hawking, whose fears of a world dominated by AI powered "killer robots" have been hitting the headlines all year, in the form of a test that can assess how dangerous AI's and algorithms really are, or, more importantly, could become.
Recent progress in AI and Reinforcement learning has shown great success in solving complex problems with high dimensional state spaces. However, most of these successes have been primarily in simulated environments where failure is of little or no consequence. Most real-world applications, however, require training solutions that are safe to operate as catastrophic failures are inadmissible especially when there is human interaction involved. Currently, Safe RL systems use human oversight during training and exploration in order to make sure the RL agent does not go into a catastrophic state. These methods require a large amount of human labor and it is very difficult to scale up. We present a hybrid method for reducing the human intervention time by combining model-based approaches and training a supervised learner to improve sample efficiency while also ensuring safety. We evaluate these methods on various grid-world environments using both standard and visual representations and show that our approach achieves better performance in terms of sample efficiency, number of catastrophic states reached as well as overall task performance compared to traditional model-free approaches
Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our PARENTING algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it "matures", and that policies learnt through PARENTING are generalisable to new environments.
AI isn't going to be a threat to humanity because it's evil or cruel, AI will be a threat to humanity because we haven't properly explained what it is we want it to do. Consider the classic "paperclip maximizer" thought experiment, in which an all-powerful AI is told, simply, "make paperclips." The AI, not constrained by any human morality or reason, does so, eventually transforming all resources on Earth into paperclips, and wiping out our species in the process. As with any relationship, when talking to our computers, communication is key. That's why a new piece of research published yesterday by Google's DeepMind and the Elon Musk-funded OpenAI institute is so interesting.