AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents

Green, Michael Cerny, Sergent, Benjamin, Shandilya, Pushyami, Kumar, Vibhor

arXiv.org Artificial IntelligenceJan-16-2019

In this paper we propose a new training loop for deep reinforcement learning agents with an evolutionary generator. Evolutionary procedural content generation has been used in the creation of maps and levels for games before. Our system incorporates an evolutionary map generator to construct a training curriculum that is evolved to maximize loss within the state-of-the-art Double Dueling Deep Q Network architecture with prioritized replay (Wang et al. 2016) (Schaul et al. 2015). We present a case-study in which we prove the efficacy of our new method on a game with a discrete, large action space we made called Attackers and Defenders. Our results demonstrate that training on an evolutionarily-curated curriculum (directed sampling) of maps both expedites training and improves generalization when compared to a network trained on an undirected sampling of maps.

generator, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1901.05431

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre:

Research Report > New Finding (0.54)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploring applications of deep reinforcement learning for real-world autonomous driving systems

Talpaert, Victor, Sobh, Ibrahim, Kiran, B Ravi, Mannion, Patrick, Yogamani, Senthil, El-Sallab, Ahmad, Perez, Patrick

arXiv.org Machine LearningJan-16-2019

Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.

arxiv preprint arxiv, reinforcement, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1901.01536

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France (0.04)
Europe > Ireland (0.04)
(8 more...)

Genre: Overview (0.86)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Monte Carlo in Reinforcement Learning, the easy way

#artificialintelligenceJan-15-2019, 16:07:13 GMT

In Dynamic Programming (DP) we have seen that in order to compute the value function on each state, we need to know the transition matrix as well as the reward system. But this is not always a realistic condition. Probably it is possible to have such thing in some board games, but in video games and real life problems like self-driving car there is no way to know these information before hand. If you recall the formula of the State-Value function from "Math Behind Reinforcement Learning" article: It is not possible to compute the V(s) because p(s',r s,a) is now unknown to us. Always keep in mind that our goal is to find the policy that maximizes the reward for an agent.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.37)

Add feedback

MTSI Opens Artificial Intelligence Tech Research Hub

#artificialintelligenceJan-15-2019, 12:46:19 GMT

Modern Technology Solutions Inc. has opened a laboratory in Huntsville, Ala., for research and development of artificial intelligence-based technology platforms for the military sector. MTSI said Friday it looks to accomplish a holistic approach to AI application through the new lab along with the company's engineering and data analytics processes. Willie Maddox, manager of AI Lab, said the company aims to apply deep reinforcement learning to address challenges related to multiagent dynamic route planning. Alexandria, Va.-based MTSI offers engineering and technology services to government customers in the missile defense, cybersecurity, intelligence, unmanned and autonomous systems, aviation, space and homeland security areas.

artificial intelligence tech research hub, machine learning, reinforcement learning

#artificialintelligence

Country:

North America > United States > Virginia > Alexandria County > Alexandria (0.34)
North America > United States > Alabama > Madison County > Huntsville (0.34)

Industry: Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

Transfer Learning for Prosthetics Using Imitation Learning

Mohammedalamen, Montaser, Khamies, Waleed D., Rosman, Benjamin

arXiv.org Artificial IntelligenceJan-15-2019

In this paper, We Apply Reinforcement learning (RL) techniques to train a realistic biomechanical model to work with different people and on different walking environments. We benchmarking 3 RL algorithms: Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) in OpenSim environment, Also we apply imitation learning to a prosthetics domain to reduce the training time needed to design customized prosthetics. We use DDPG algorithm to train an original expert agent. We then propose a modification to the Dataset Aggregation (DAgger) algorithm to reuse the expert knowledge and train a new target agent to replicate that behaviour in fewer than 5 iterations, compared to the 100 iterations taken by the expert agent which means reducing training time by 95%. Our modifications to the DAgger algorithm improve the balance between exploiting the expert policy and exploring the environment. We show empirically that these improve convergence time of the target agent, particularly when there is some degree of variation between expert and naive agent.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1901.04772

Country:

Asia > Singapore (0.05)
Africa > Sudan > Khartoum State > Khartoum (0.05)
Africa > Sudan > Khartoum (0.05)

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning

Gao, Guanyu, Li, Jie, Wen, Yonggang

arXiv.org Artificial IntelligenceJan-15-2019

Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves various influencing factors in a building environment, which are usually hard to model and may be different from case to case. To address this challenge, we propose a deep reinforcement learning based framework for energy optimization and thermal comfort control in smart buildings. We formulate the building thermal control as a cost-minimization problem which jointly considers the energy consumption of HVAC and the thermal comfort of the occupants. To solve the problem, we first adopt a deep neural network based approach for predicting the occupants' thermal comfort, and then adopt Deep Deterministic Policy Gradients (DDPG) for learning the thermal control policy. To evaluate the performance, we implement a building thermal control simulation system and evaluate the performance under various settings. The experiment results show that our method can improve the thermal comfort prediction accuracy, and reduce the energy consumption of HVAC while improving the occupants' thermal comfort.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1901.04693

Country:

Asia > Singapore (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre: Research Report (0.70)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Construction & Engineering > HVAC (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback

Beck, Jacob, Papakipos, Zoe, Littman, Michael

arXiv.org Machine LearningJan-15-2019

In autonomous vehicle (AV) control, allowing mistakes can be quite dangerous and costly in the real world. For this reason we investigate methods of training an AV without allowing the agent to explore and instead having a human explorer collect the data. Supervised learning has been explored for AV control, but it encounters the issue of the covariate shift. That is, training data collected from an optimal demonstration consists only of the states induced by the optimal control policy, but at runtime, the trained agent may encounter a vastly different state distribution with little relevant training data. To mitigate this issue, we have our human explorer make sub-optimal decisions. In order to have our agent not replicate these sub-optimal decisions, supervised learning requires that we either erase these actions, or replace these action with the correct action. Erasing is wasteful and replacing is difficult, since it is not easy to know the correct action without driving. We propose an alternate framework that includes continuous scalar feedback for each action, marking which actions we should replicate, which we should avoid, and how sure we are. Our framework learns continuous control from sub-optimal demonstration and evaluative feedback collected before training. We find that a human demonstrator can explore sub-optimal states in a safe manner, while still getting enough gradation to benefit learning. The collection method for data and feedback we call "Backseat Driver." We call the more general learning framework ReNeg, since it learns a regression from states to actions given negative as well as positive examples. We empirically validate several models in the ReNeg framework, testing on lane-following with limited data. We find that the best solution is a generalization of mean-squared error and outperforms supervised learning on the positive examples alone.

loss function, negative example, positive example, (15 more...)

arXiv.org Machine Learning

1901.05101

Country: North America > United States > Rhode Island > Providence County > Providence (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Add feedback

Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning

Peng, Xuefeng, Ding, Yi, Wihl, David, Gottesman, Omer, Komorowski, Matthieu, Lehman, Li-wei H., Ross, Andrew, Faisal, Aldo, Doshi-Velez, Finale

arXiv.org Machine LearningJan-15-2019

Sepsis is the leading cause of mortality in the ICU. It is challenging to manage because individual patients respond differently to treatment. Thus, tailoring treatment to the individual patient is essential for the best outcomes. In this paper, we take steps toward this goal by applying a mixture-of-experts framework to personalize sepsis treatment. The mixture model selectively alternates between neighbor-based (kernel) and deep reinforcement learning (DRL) experts depending on patient's current history. On a large retrospective cohort, this mixture-based approach outperforms physician, kernel only, and DRL-only experts.

history, kernel policy, mortality, (14 more...)

arXiv.org Machine Learning

1901.0467

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reinforcement learning without gradients: evolving agents using Genetic Algorithms

#artificialintelligenceJan-14-2019, 06:33:05 GMT

During holidays I wanted to ramp up my reinforcement learning skills. Knowing absolutely nothing about the field, I did a course where I was exposed to Q-learning and its "deep" equivalent (Deep-Q Learning). That's where I got exposed to OpenAI's Gym where they have several environments for the agent to play in and learn from. The course was limited to Deep-Q learning, so as I read more on my own. I realized there are now better algorithms such as policy gradients and its variations (such as Actor-Critic method).

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game

Nechepurenko, Liudmyla, Voss, Viktor, Gritsenko, Vyacheslav

arXiv.org Artificial IntelligenceJan-14-2019

We compare a novel Knowledge-based Reinforcement Learning (KB-RL) approach with the traditional Neural Network (NN) method in solving a classical task of the Artificial Intelligence (AI) field. Neural networks became very prominent in recent years and, combined with Reinforcement Learning, proved to be very effective for one of the frontier challenges in AI - playing the game of Go. Our experiment shows that a KB-RL system is able to outperform a NN in a task typical for NN, such as optimizing a regression problem. Furthermore, KB-RL offers a range of advantages in comparison to the traditional Machine Learning methods. Particularly, there is no need for a large dataset to start and succeed with this approach, its learning process takes considerably less effort, and its decisions are fully controllable, explicit and predictable.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1901.04626

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Leisure & Entertainment > Games > Go (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback