Reinforcement Learning
Reinforcement Learning with a Corrupted Reward Channel
Everitt, Tom, Krakovna, Victoria, Orseau, Laurent, Hutter, Marcus, Legg, Shane
No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.
Gartner's Hype Cycle for Emerging Technologies, 2017 Adds 5G And Deep Learning For First Time
The Hype Cycle for Emerging Technologies, 2017 provides insights gained from evaluations of more than 2,000 technologies the research and advisory firms tracks. From this large base of technologies, the technologies that show the most potential for delivering a competitive advantage over the next five to 10 years are included in the Hype Cycle. The eight technologies added to the Hype Cycle this year include 5G, Artificial General Intelligence, Deep Learning, Deep Reinforcement Learning, Digital Twin, Edge Computing, Serverless PaaS and Cognitive Computing. Ten technologies not included in the hype cycle for 2017 include 802.11ax, The three most dominant trends include Artifical Intelligence (AI) Everywhere, Transparently Immersive Experiences, and Digital Platforms.
What Types of Questions Can Data Science Answer?
As you may have gathered, the families of two-class classification, multi-class classification, anomaly detection, and regression are all closely related. Entirely different sets of data science questions belong in the extended algorithm families of unsupervised and reinforcement learning. Another family of unsupervised learning algorithms are called dimensionality reduction techniques. These are called reinforcement learning (RL) algorithms.
A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games
Leibfried, Felix, Kushman, Nate, Hofmann, Katja
Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as deep Q-networks, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment. In this paper, we take a step towards using model-based techniques in environments with a high-dimensional visual state space by demonstrating that it is possible to learn system dynamics and the reward structure jointly. Our contribution is to extend a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well. To this end, we phrase a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Empirical evaluations on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these results as opening up important directions for model-based reinforcement learning in complex, initially unknown environments.
Learning to Perform Physics Experiments via Deep Reinforcement Learning
Denil, Misha, Agrawal, Pulkit, Kulkarni, Tejas D, Erez, Tom, Battaglia, Peter, de Freitas, Nando
When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems; however, it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and the cost incurred by the agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations. We also compare our learned experimentation policies to randomized baselines and show that the learned policies lead to better predictions.
Gartner's Hype Cycle for Emerging Technologies, 2017 Adds 5G And Deep Learning For First Time
The Hype Cycle for Emerging Technologies, 2017 provides insights gained from evaluations of more than 2,000 technologies the research and advisory firms tracks. From this large base of technologies, the technologies that show the most potential for delivering a competitive advantage over the next five to 10 years are included in the Hype Cycle. The eight technologies added to the Hype Cycle this year include 5G, Artificial General Intelligence, Deep Learning, Deep Reinforcement Learning, Digital Twin, Edge Computing, Serverless PaaS and Cognitive Computing. Ten technologies not included in the hype cycle for 2017 include 802.11ax, The three most dominant trends include Artifical Intelligence (AI) Everywhere, Transparently Immersive Experiences, and Digital Platforms.
[Discussion] School choices for career in ML from non-traditional background โข r/MachineLearning
Hello, I'm looking for some advice on school choices for someone from a non-traditional background (undergrad and current master in chemical engineering, focused on controls) for getting into the ML field. Currently doing 1st year of 2 in Master in chemical engineering, my research topic is applying reinforcement learning to optimal control problems in smart grid energy management/demand-side management. I've been learning ML and RL for the past 3 years, can currently keep up with papers, implement these papers in Tensorflow, Pytorch and working on some additional personal projects (Deep RL related). Ultimately I'd like to work in a ML/RL research or applied position (non-academic, in private company research labs). My current worry is that my chem eng background is a bit of a non-traditional background, and I'm not sure how much of that will hinder my goal for getting the jobs I'm aiming for.
Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies
Hein, Daniel, Hentschel, Alexander, Runkler, Thomas, Udluft, Steffen
Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because online learning requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. Therefore, FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.
Reinforcement learning for complex goals, using TensorFlow
Attention readers: We invite you to access the corresponding Python code and iPython notebooks for this article on GitHub. Reinforcement learning (RL) is about training agents to complete tasks. We typically think of this as being able to accomplish some goal. Take, for example, a robot we might want to train to open a door. Reinforcement learning can be used as a framework for teaching the robot to open the door by allowing it to learn from trial and error.
UbuntuWorld 1.0 LTS - A Platform for Automated Problem Solving & Troubleshooting in the Ubuntu OS
Chakraborti, Tathagata, Talamadupula, Kartik, Fadnis, Kshitij P., Campbell, Murray, Kambhampati, Subbarao
In this paper we present UbuntuWorld 1.0 LTS - a platform for developing automated technical support agents in the Ubuntu operating system. Specifically, we propose to use the Bash terminal as a simulator of the Ubuntu environment for a learning-based agent, and demonstrate the usefulness of adopting reinforcement learning (RL) techniques for basic problem solving and troubleshooting in this environment. We provide a plug-and-play interface to the simulator as a python package where different types of agents can be plugged in and evaluated, and provide pathways for integrating data from online support forums like Ask Ubuntu into an automated agent's learning process. Finally, we show that the use of this data significantly improves the agent's learning efficiency. We believe that this platform can be adopted as a real-world test bed for research on automated technical support.