AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Researchers' deep learning algorithm solves Rubik's Cube faster than any human

#artificialintelligenceJul-17-2019, 13:02:41 GMT

Since its invention by a Hungarian architect in 1974, the Rubik's Cube has furrowed the brows of many who have tried to solve it, but the 3-D logic puzzle is no match for an artificial intelligence system created by researchers at the University of California, Irvine. DeepCubeA, a deep reinforcement learning algorithm programmed by UCI computer scientists and mathematicians, can find the solution in a fraction of a second, without any specific domain knowledge or in-game coaching from humans. This is no simple task considering that the cube has completion paths numbering in the billions but only one goal state--each of six sides displaying a solid color--which apparently can't be found through random moves. For a study published today in Nature Machine Intelligence, the researchers demonstrated that DeepCubeA solved 100 percent of all test configurations, finding the shortest path to the goal state about 60 percent of the time. The algorithm also works on other combinatorial games such as the sliding tile puzzle, Lights Out and Sokoban.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Country: North America > United States > California > Orange County > Irvine (0.26)

Genre: Research Report > New Finding (0.37)

Industry: Leisure & Entertainment > Games > Rubik's Cube (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Artificial Intelligence (AI) in Machine Learning

#artificialintelligenceJul-17-2019, 06:25:29 GMT

Is a learning methodology that interacts with its setting by manufacturing actions and discovers errors or rewards. Trial and error search and delayed reward area unit the foremost relevant characteristics of reinforcement learning. This methodology permits machines and computer code agents to mechanically verify the best behavior among a selected context so as to maximise its performance. Machine learning allows analysis of huge quantities of information, whereas it typically delivers quicker, a lot of correct leads to order to spot profitable opportunities or dangerous risks, It's going to conjointly need beyond regular time and resources to coach it properly. Combining machine learning with AI and psychological feature technologies will create it even simpler in process massive volumes of knowledge.

artificial intelligence, machine learning, reinforcement learning, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning

Biferale, Luca, Bonaccorso, Fabio, Buzzicotti, Michele, Di Leoni, Patricio Clark, Gustavsson, Kristian

arXiv.org Artificial IntelligenceJul-17-2019

To find the path that minimizes the time to navigate between two given points in a fluid flow is known as the Zermelo's problem. Here, we investigate it by using a Reinforcement Learning (RL) approach for the case of a vessel which has a slip velocity with fixed intensity, V_s, but variable direction and navigating in a 2D turbulent sea. We use an Actor-Critic RL algorithm, and compare the results with strategies obtained analytically from continuous Optimal Navigation (ON) protocols. We show that for our application, ON solutions are unstable for the typical duration of the navigation process, and are therefore not useful in practice. On the other hand, RL solutions are much more robust with respect to small changes in the initial conditions and to external noise, and are able to find optimal trajectories even when V_s is much smaller than the maximum flow velocity. Furthermore, we show how the RL approach is able to take advantage of the flow properties in order to reach the target, especially when the steering speed is small.

artificial intelligence, trajectory, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

1907.08591

Country:

Europe > Sweden (0.14)
Europe > Italy (0.14)
North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Photonic architecture for reinforcement learning

Flamini, Fulvio, Hamann, Arne, Jerbi, Sofiène, Trenkwalder, Lea M., Nautrup, Hendrik Poulsen, Briegel, Hans J.

arXiv.org Artificial IntelligenceJul-17-2019

The last decade has seen an unprecedented growth in artificial intelligence and photonic technologies, both of which drive the limits of modern-day computing devices. In line with these recent developments, this work brings together the state of the art of both fields within the framework of reinforcement learning. We present the blueprint for a photonic implementation of an active learning machine incorporating contemporary algorithms such as SARSA, Q-learning, and projective simulation. We numerically investigate its performance within typical reinforcement learning environments, showing that realistic levels of experimental noise can be tolerated or even be beneficial for the learning process. Remarkably, the architecture itself enables mechanisms of abstraction and generalization, two features which are often considered key ingredients for artificial intelligence. The proposed architecture, based on single-photon evolution on a mesh of tunable beamsplitters, is simple, scalable, and a first integration in portable systems appears to be within the reach of near-term technology.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/1367-2630/ab783c

1907.07503

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Variable Impedance Control for Contact Sensitive Tasks

Bogdanovic, Miroslav, Khadiv, Majid, Righetti, Ludovic

arXiv.org Artificial IntelligenceJul-17-2019

Reinforcement learning algorithms have shown great success in solving different problems ranging from playing video games to robotics. However, they struggle to solve delicate robotic problems, especially those involving contact interactions. Though in principle a policy outputting joint torques should be able to learn these tasks, in practice we see that they have difficulty to robustly solve the problem without any structure in the action space. In this paper, we investigate how the choice of action space can give robust performance in presence of contact uncertainties. We propose to learn a policy that outputs impedance and desired position in joint space as a function of system states without imposing any other structure to the problem. We compare the performance of this approach to torque and position control policies under different contact uncertainties. Extensive simulation results on two different systems, a hopper (floating-base) with intermittent contacts and a manipulator (fixed-base) wiping a table, show that our proposed approach outperforms policies outputting torque or position in terms of both learning rate and robustness to environment uncertainty.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1907.075

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

PPO Dash: Improving Generalization in Deep Reinforcement Learning

Booth, Joe

arXiv.org Artificial IntelligenceJul-17-2019

Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1907.06704

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Prior Selection for Repertoire-based Online Learning in Robotics

Kaushik, Rituraj, Desreumaux, Pierre, Mouret, Jean-Baptiste

arXiv.org Artificial IntelligenceJul-16-2019

Among the data-efficient approaches for online adaptation in robotics (meta-learning, model-based reinforcement learning, etc.), repertoire-based learning (1) generates a large and diverse set policies in simulation that acts as a "reservoir" for future adaptations and (2) learns to pick online the best working policies according to the current situation (e.g., a damaged robot, a new object, etc.). Each of these policies performs a different task, for instance, walking in different directions; these policies are then sequenced with a planning algorithm to achieve the given task. In this paper, we relax the assumption of previous works that a single repertoire is enough for adaptation. Instead, we generate repertoires for many different situations (e.g., with a missing leg, on different floors, etc.) in simulation that act as priors for adaptation. Our main contribution is an algorithm, APROL (Adaptive Prior selection for Repertoire-based Online Learning) to plan the next action by incorporating these priors when the robot has no information about the current situation. We evaluate APROL on two simulated tasks: (1) pushing unknown objects of various shapes and sizes with a kuka arm and (2) a goal reaching task with a damaged hexapod robot. We compare with "Reset-free Trial and Error" (RTE) and various single repertoire-based baselines. The results show that APROL solves both tasks in less interaction time than the baselines. Additionally, we demonstrate APROL on a real, damaged hexapod that quickly learns compensatory policies to reach a goal by avoiding obstacle in the path.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1907.07029

Country: Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report (0.84)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

Improved Reinforcement Learning through Imitation Learning Pretraining Towards Image-based Autonomous Driving

Wang, Tianqi, Chang, Dong Eui

arXiv.org Artificial IntelligenceJul-16-2019

We present a training pipeline for the autonomous driving task given the current camera image and vehicle speed as the input to produce the throttle, brake, and steering control output. The simulator Airsim's convenient weather and lighting API provides a sufficient diversity during training which can be very helpful to increase the trained policy's robustness. In order to not limit the possible policy's performance, we use a continuous and deterministic control policy setting. We utilize ResNet-34 as our actor and critic networks with some slight changes in the fully connected layers. Considering human's mastery of this task and the high-complexity nature of this task, we first use imitation learning to mimic the given human policy and leverage the trained policy and its weights to the reinforcement learning phase for which we use DDPG. This combination shows a considerable performance boost comparing to both pure imitation learning and pure DDPG for the autonomous driving task.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1907.06838

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.83)
Information Technology > Robotics & Automation (0.83)
Automobiles & Trucks (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

An Inductive Synthesis Framework for Verifiable Reinforcement Learning

Zhu, He, Xiong, Zikang, Magill, Stephen, Jagannathan, Suresh

arXiv.org Artificial IntelligenceJul-16-2019

Despite the tremendous advances that have been made in the last decade on developing useful machine-learning applications, their wider adoption has been hindered by the lack of strong assurance guarantees that can be made about their behavior. In this paper, we consider how formal verification techniques developed for traditional software systems can be repurposed for verification of reinforcement learning-enabled ones, a particularly important class of machine learning systems. Rather than enforcing safety by examining and altering the structure of a complex neural network implementation, our technique uses blackbox methods to synthesizes deterministic programs, simpler, more interpretable, approximations of the network that can nonetheless guarantee desired safety properties are preserved, even when the network is deployed in unanticipated or previously unobserved environments. Our methodology frames the problem of neural network verification in terms of a counterexample and syntax-guided inductive synthesis procedure over these programs. The synthesis procedure searches for both a deterministic program and an inductive invariant over an infinite state transition system that represents a specification of an application's control logic. Additional specifications defining environment-based constraints can also be provided to further refine the search space. Synthesized programs deployed in conjunction with a neural network implementation dynamically enforce safety conditions by monitoring and preventing potentially unsafe actions proposed by neural policies. Experimental results over a wide range of cyber-physical applications demonstrate that software-inspired formal verification techniques can be used to realize trustworthy reinforcement learning systems with low overhead.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3314221.3314638

1907.07273

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (0.46)
Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

Landolfi, Nicholas C., Thomas, Garrett, Ma, Tengyu

arXiv.org Artificial IntelligenceJul-15-2019

The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform sample-efficient adaptation to new tasks at test time. We use significantly fewer samples by performing policy optimization only in a "virtual" environment whose transitions are given by our learned dynamical model. Our algorithm sequentially trains against several tasks. Upon encountering a new task, we first warm-up a policy on our learned dynamical model, which requires no new samples from the environment. We then adapt the dynamical model with samples from this policy in the real environment. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1907.04964

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback