Reinforcement Learning
Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space
This paper proposes a novel deep reinforcement learning architecture that was inspired by previous tree structured architectures which were only useable in discrete action spaces. Policy Prediction Network offers a way to improve sample complexity and performance on continuous control problems in exchange for extra computation at training time but at no cost in computation at rollout time. Our approach integrates a mix between model-free and model-based reinforcement learning. Policy Prediction Network is the first to introduce implicit model-based learning to Policy Gradient algorithms for continuous action space and is made possible via the empirically justified clipping scheme. Our experiments are focused on the MuJoCo environments so that they can be compared with similar work done in this area.
Reinforcement Learning - The Value Function
Codes and demo are available. This article explores what are states, actions and rewards in reinforcement learning, and how agent can learn through simulation to determine the best actions to take in any given state. After a long day at work, you are deciding between 2 choices: to head home and write an article or hang out with friends at a bar. If you choose to hang out with friends, your friends will make you feel happy; whereas heading home to write an article, you'll end up feeling tired after a long day at work. In this example, enjoying yourself is a reward and feeling tired is viewed as a negative reward, so why write articles?
Introduction to Various Reinforcement Learning Algorithms. Part I (Q-Learning, SARSA, DQN, DDPG)
Typically, a RL setup is composed of two components, an agent and an environment. Then environment refers to the object that the agent is acting on (e.g. the game itself in the Atari game), while the agent represents the RL algorithm. The environment starts by sending a state to the agent, which then based on its knowledge to take an action in response to that state. After that, the environment send a pair of next state and reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action.
Unsupervised Learning Will Bring About The Next AI Revolution
A6 months old baby won't even notice if a toy truck drives off a platform and seems to fly in the air. However, if the same experiment is repeated 2 to 3 months later, the baby will immediately identify that something is wrong. This means that the baby has already learned the concept of gravity. "Nobody tells a baby that objects are supposed to fall," said the chief AI scientist at Facebook and a professor at NYU, Dr. Yann LeCun, during a webinar organized by the Association for Computing Machinery, an industry body. Because babies do not have very sophisticated motor control, LeCun hypothesizes, "a lot of what they learn about the world is through observation."
Learning to Learn with Probabilistic Task Embeddings
To operate successfully in a complex and changing environment, learning agents must be able to acquire new skills quickly. Humans display remarkable skill in this area -- we can learn to recognize a new object from one example, adapt to driving a different car in a matter of minutes, and add a new slang word to our vocabulary after hearing it once. Meta-learning is a promising approach for enabling such capabilities in machines. In this paradigm, the agent adapts to a new task from limited data by leveraging a wealth of experience collected in performing related tasks. For agents that must take actions and collect their own experience, meta-reinforcement learning (meta-RL) holds the promise of enabling fast adaptation to new scenarios.
Introduction to Various Reinforcement Learning Algorithms
Then environment refers to the object that the agent is acting on (e.g. the game itself in the Atari game), while the agent represents the RL algorithm. The environment starts by sending a state to the agent, which then based on its knowledge to take an action in response to that state. After that, the environment send a pair of next state and reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action. The loop keeps going on until the environment sends a terminal state, which ends to episode.
Node Injection Attacks on Graphs via Reinforcement Learning
Sun, Yiwei, Wang, Suhang, Tang, Xianfeng, Hsieh, Tsung-Yu, Honavar, Vasant
Real-world graph applications, such as advertisements and product recommendations make profits based on accurately classify the label of the nodes. However, in such scenarios, there are high incentives for the adversaries to attack such graph to reduce the node classification performance. Previous work on graph adversarial attacks focus on modifying existing graph structures, which is infeasible in most real-world applications. In contrast, it is more practical to inject adversarial nodes into existing graphs, which can also potentially reduce the performance of the classifier. In this paper, we study the novel node injection poisoning attacks problem which aims to poison the graph. We describe a reinforcement learning based method, namely NIPA, to sequentially modify the adversarial information of the injected nodes. We report the results of experiments using several benchmark data sets that show the superior performance of the proposed method NIPA, relative to the existing state-of-the-art methods.
Driving in Dense Traffic with Model-Free Reinforcement Learning
Saxena, Dhruv Mauria, Bae, Sangjae, Nakhaei, Alireza, Fujimura, Kikuo, Likhachev, Maxim
Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.
Modeling Collaboration for Robot-assisted Dressing Tasks
Clegg, Alexander, Kemp, Charles C., Turk, Greg, Liu, C. Karen
Modeling Collaboration for Robot-assisted Dressing T asks Alexander Clegg, 1, 2, Charles C. Kemp 1, Greg Turk 1, and C. Karen Liu 1, 3 Abstract -- We investigated the application of haptic aware feedback control and deep reinforcement learning to robot assisted dressing in simulation. We did so by modeling both human and robot control policies as separate neural networks and training them both via TRPO. We show that co-optimization, training separate human and robot control policies simultaneously, can be a valid approach to finding successful strategies for human/robot cooperation on assisted dressing tasks. Typical tasks are putting on one or both sleeves of a hospital gown or pulling on a T -shirt. We also present a method for modeling human dressing behavior under variations in capability including: unilateral muscle weakness, Dyskinesia, and limited range of motion. Using this method and behavior model, we demonstrate discovery of successful strategies for a robot to assist humans with a variety of capability limitations. I NTRODUCTION It becomes ever more likely that robots will be found in homes and businesses, physically interacting with the humans they encounter. With this in mind, researchers have begun preparing robots for the physical interaction tasks which they will face in a human world. Dressing tasks in particular present a multitude of privacy, safety, and independence concerns which strongly motivate the application of robotic assistance [1]. However, clothing exhibits complex dynamics and often occludes the body, making it difficult to accurately observe the task state and predict the results of planned interactions. These challenges are compounded by the risk of injuring the human or damaging the robot as well as the sparsity of data that could be collected during physical task exploration.
Artificial Intelligence May Help Slow Down the Aging Process - Conduct Science
They believe the new technology has the potential to transform the pharmaceutical industry. Especially when it comes to research on the aging process and how we can prolong it. The research team uses a combination of Reinforcement Learning (RL) and Generative Adversarial Networks (GANs) to better drug development. They show that this process is more effective than the previous Hit to Lead (H2L) method. This fact is most obvious when we compare the time it takes the H2L and AI to develop new drugs.