Reinforcement Learning
Neural Turing Machine • /r/MachineLearning
Hi folks, I have a few questions about NTM. Is there any extension to these models? There are extensions, most notably the Reinforcement Learning NTM which uses the Reinforce rule to apply hard attention to the memory, and also models that use the NTM with different access modules. There is an implementation in Torch.
Why Graphics Cards are Hacking the Future
In 2013, years before DeepMind would unveil an AI that would defeat of one of the world's best Go players, the company published an influential paper showing how "deep reinforcement learning" could be used to teach computers how to play Atari 2600 video games. While GPUs were originally designed as specialized processors optimized to render millions of pixels required for simulating 3D environments, repurposing GPUs to train artificial intelligence algorithms has been commonplace for a while. But it wasn't until I read Andrej Karpathy's recent post on reinforcement learning, however, that something clicked about how interesting this is: Graphics cards, originally designed for human vision of video games, are now being used for computer "vision" of video games. When I was growing up, getting a graphics card was kind of a Big Deal. The first one I got for Christmas in 1997 was a Pure3D Canopus Voodoo card based on the 3dfx chipset. It let me run Quake smoothly on my Pentium Compaq, which was a top priority of my life at the time.
This robot chooses which human victims it wants to inflict pain on
The threat of killer robots may sound a little far-fetched but this latest'harmful robot' suggests we may have taken a step closer to this dystopian reality. Roboticist Alexander Reben from the University of Berkeley, California, has created a bot called "The First Law" that is capable of pricking a finger, but is programmed to choose not to every time if it means avoiding being switched off. Ultimately, it can decide whether or not to inflict pain to serve its own interest. The robot is named after the first law in a set of rules devised by sci-fi author Isaac Asimov, which - quoted as being from the Handbook of Robotics, 2058 AD – states "a robot may not injure a human being or, through inaction, allow a human being to come to harm". Reben's research paper explains how the robot operates in relation to "reinforcement learning agents" and how they are unlikely to behave optimally all the time.
Model-Free Episodic Control
Blundell, Charles, Uria, Benigno, Pritzel, Alexander, Li, Yazhe, Ruderman, Avraham, Leibo, Joel Z, Rae, Jack, Wierstra, Daan, Hassabis, Demis
State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.
This Week's Awesome Stories From Around the Web (Through June 11th)
ROBOTICS: Vyo Is a Fascinating and Unique Take on Social Domestic Robots Evan Ackerman IEEE Spectrum "Vyo is'a personal assistant serving as a centralized interface for smart home devices.' Nothing new there, but what sets Vyo apart is how you interact with it: it combines non-anthropomorphic design with anthropomorphic expressiveness and a tactile object-based control system into a social robot that's totally, adorably different." ARTIFICIAL INTELLIGENCE: The AI Machines Undergoing Behavioral Psychology Tests Technology Review "The team says the best performing AI system uses deep reinforcement learning enhanced with additional memory. These machines retrieve relevant memories based on the context in which they were stored and in which the device finds itself. That's different from many existing memory systems that do not rely on context for memory retrieval." INTERNET: A Computer Tried (and Failed) to Write This Article Adrienne Lafrance The Atlantic "Here I am, a human, writing a story that was assigned to a machine.
Google developing 'kill switch' to stop robot uprising against humans
They referenced a robot that learned how to pause a game of Tetris to avoid losing, adding that AIs are "unlikely to behave optimally all the time". "We have proposed a framework to allow a human operator to repeatedly safely interrupt a reinforcement learning agent while making sure the agent will not learn to prevent or induce these interruptions," the paper concluded. "Safe interruptibility can be useful to take control of a robot that is misbehaving and may lead to irreversible consequences, or to take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform or would not normally receive rewards for this."
Deep Successor Reinforcement Learning
Kulkarni, Tejas D., Saeedi, Ardavan, Gautam, Simanta, Gershman, Samuel J.
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.
Deep Reinforcement Learning with a Natural Language Action Space
He, Ji, Chen, Jianshu, He, Xiaodong, Gao, Jianfeng, Li, Lihong, Deng, Li, Ostendorf, Mari
This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based games. Termed a deep reinforcement relevance network (DRRN), the architecture represents action and state spaces with separate embedding vectors, which are combined with an interaction function to approximate the Q-function in reinforcement learning. We evaluate the DRRN on two popular text games, showing superior performance over other deep Q-learning architectures. Experiments with paraphrased action descriptions show that the model is extracting meaning rather than simply memorizing strings of text.