Reinforcement Learning
Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents
Green, Michael Cerny, Sergent, Benjamin, Shandilya, Pushyami, Kumar, Vibhor
In this paper we propose a new training loop for deep reinforcement learning agents with an evolutionary generator. Evolutionary procedural content generation has been used in the creation of maps and levels for games before. Our system incorporates an evolutionary map generator to construct a training curriculum that is evolved to maximize loss within the state-of-the-art Double Dueling Deep Q Network architecture with prioritized replay (Wang et al. 2016) (Schaul et al. 2015). We present a case-study in which we prove the efficacy of our new method on a game with a discrete, large action space we made called Attackers and Defenders. Our results demonstrate that training on an evolutionarily-curated curriculum (directed sampling) of maps both expedites training and improves generalization when compared to a network trained on an undirected sampling of maps.
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Talpaert, Victor, Sobh, Ibrahim, Kiran, B Ravi, Mannion, Patrick, Yogamani, Senthil, El-Sallab, Ahmad, Perez, Patrick
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.
Monte Carlo in Reinforcement Learning, the easy way
In Dynamic Programming (DP) we have seen that in order to compute the value function on each state, we need to know the transition matrix as well as the reward system. But this is not always a realistic condition. Probably it is possible to have such thing in some board games, but in video games and real life problems like self-driving car there is no way to know these information before hand. If you recall the formula of the State-Value function from "Math Behind Reinforcement Learning" article: It is not possible to compute the V(s) because p(s',r s,a) is now unknown to us. Always keep in mind that our goal is to find the policy that maximizes the reward for an agent.
MTSI Opens Artificial Intelligence Tech Research Hub
Modern Technology Solutions Inc. has opened a laboratory in Huntsville, Ala., for research and development of artificial intelligence-based technology platforms for the military sector. MTSI said Friday it looks to accomplish a holistic approach to AI application through the new lab along with the company's engineering and data analytics processes. Willie Maddox, manager of AI Lab, said the company aims to apply deep reinforcement learning to address challenges related to multiagent dynamic route planning. Alexandria, Va.-based MTSI offers engineering and technology services to government customers in the missile defense, cybersecurity, intelligence, unmanned and autonomous systems, aviation, space and homeland security areas.
Transfer Learning for Prosthetics Using Imitation Learning
Mohammedalamen, Montaser, Khamies, Waleed D., Rosman, Benjamin
In this paper, We Apply Reinforcement learning (RL) techniques to train a realistic biomechanical model to work with different people and on different walking environments. We benchmarking 3 RL algorithms: Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) in OpenSim environment, Also we apply imitation learning to a prosthetics domain to reduce the training time needed to design customized prosthetics. We use DDPG algorithm to train an original expert agent. We then propose a modification to the Dataset Aggregation (DAgger) algorithm to reuse the expert knowledge and train a new target agent to replicate that behaviour in fewer than 5 iterations, compared to the 100 iterations taken by the expert agent which means reducing training time by 95%. Our modifications to the DAgger algorithm improve the balance between exploiting the expert policy and exploring the environment. We show empirically that these improve convergence time of the target agent, particularly when there is some degree of variation between expert and naive agent.
Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning
Gao, Guanyu, Li, Jie, Wen, Yonggang
Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves various influencing factors in a building environment, which are usually hard to model and may be different from case to case. To address this challenge, we propose a deep reinforcement learning based framework for energy optimization and thermal comfort control in smart buildings. We formulate the building thermal control as a cost-minimization problem which jointly considers the energy consumption of HVAC and the thermal comfort of the occupants. To solve the problem, we first adopt a deep neural network based approach for predicting the occupants' thermal comfort, and then adopt Deep Deterministic Policy Gradients (DDPG) for learning the thermal control policy. To evaluate the performance, we implement a building thermal control simulation system and evaluate the performance under various settings. The experiment results show that our method can improve the thermal comfort prediction accuracy, and reduce the energy consumption of HVAC while improving the occupants' thermal comfort.
ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback
Beck, Jacob, Papakipos, Zoe, Littman, Michael
In autonomous vehicle (AV) control, allowing mistakes can be quite dangerous and costly in the real world. For this reason we investigate methods of training an AV without allowing the agent to explore and instead having a human explorer collect the data. Supervised learning has been explored for AV control, but it encounters the issue of the covariate shift. That is, training data collected from an optimal demonstration consists only of the states induced by the optimal control policy, but at runtime, the trained agent may encounter a vastly different state distribution with little relevant training data. To mitigate this issue, we have our human explorer make sub-optimal decisions. In order to have our agent not replicate these sub-optimal decisions, supervised learning requires that we either erase these actions, or replace these action with the correct action. Erasing is wasteful and replacing is difficult, since it is not easy to know the correct action without driving. We propose an alternate framework that includes continuous scalar feedback for each action, marking which actions we should replicate, which we should avoid, and how sure we are. Our framework learns continuous control from sub-optimal demonstration and evaluative feedback collected before training. We find that a human demonstrator can explore sub-optimal states in a safe manner, while still getting enough gradation to benefit learning. The collection method for data and feedback we call "Backseat Driver." We call the more general learning framework ReNeg, since it learns a regression from states to actions given negative as well as positive examples. We empirically validate several models in the ReNeg framework, testing on lane-following with limited data. We find that the best solution is a generalization of mean-squared error and outperforms supervised learning on the positive examples alone.
Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning
Peng, Xuefeng, Ding, Yi, Wihl, David, Gottesman, Omer, Komorowski, Matthieu, Lehman, Li-wei H., Ross, Andrew, Faisal, Aldo, Doshi-Velez, Finale
Sepsis is the leading cause of mortality in the ICU. It is challenging to manage because individual patients respond differently to treatment. Thus, tailoring treatment to the individual patient is essential for the best outcomes. In this paper, we take steps toward this goal by applying a mixture-of-experts framework to personalize sepsis treatment. The mixture model selectively alternates between neighbor-based (kernel) and deep reinforcement learning (DRL) experts depending on patient's current history. On a large retrospective cohort, this mixture-based approach outperforms physician, kernel only, and DRL-only experts.
Reinforcement learning without gradients: evolving agents using Genetic Algorithms
During holidays I wanted to ramp up my reinforcement learning skills. Knowing absolutely nothing about the field, I did a course where I was exposed to Q-learning and its "deep" equivalent (Deep-Q Learning). That's where I got exposed to OpenAI's Gym where they have several environments for the agent to play in and learn from. The course was limited to Deep-Q learning, so as I read more on my own. I realized there are now better algorithms such as policy gradients and its variations (such as Actor-Critic method).
Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game
Nechepurenko, Liudmyla, Voss, Viktor, Gritsenko, Vyacheslav
We compare a novel Knowledge-based Reinforcement Learning (KB-RL) approach with the traditional Neural Network (NN) method in solving a classical task of the Artificial Intelligence (AI) field. Neural networks became very prominent in recent years and, combined with Reinforcement Learning, proved to be very effective for one of the frontier challenges in AI - playing the game of Go. Our experiment shows that a KB-RL system is able to outperform a NN in a task typical for NN, such as optimizing a regression problem. Furthermore, KB-RL offers a range of advantages in comparison to the traditional Machine Learning methods. Particularly, there is no need for a large dataset to start and succeed with this approach, its learning process takes considerably less effort, and its decisions are fully controllable, explicit and predictable.