AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

On the Convergence of Consensus Algorithms with Markovian Noise and Gradient Bias

Wai, Hoi-To

arXiv.org Machine LearningNov-5-2020

This paper presents a finite time convergence analysis for a decentralized stochastic approximation (SA) scheme. The scheme generalizes several algorithms for decentralized machine learning and multi-agent reinforcement learning. Our proof technique involves separating the iterates into their respective consensual parts and consensus error. The consensus error is bounded in terms of the stationarity of the consensual part, while the updates of the consensual part can be analyzed as a perturbed SA scheme. Under the Markovian noise and time varying communication graph assumptions, the decentralized SA scheme has an expected convergence rate of ${\cal O}(\log T/ \sqrt{T} )$, where $T$ is the iteration number, in terms of squared norms of gradient for nonlinear SA with smooth but non-convex cost function. This rate is comparable to the best known performances of SA in a centralized setting with a non-convex potential function.

approximation, optimization, stochastic update, (14 more...)

arXiv.org Machine Learning

2008.07841

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

LBGP: Learning Based Goal Planning for Autonomous Following in Front

Nikdel, Payam, Vaughan, Richard, Chen, Mo

arXiv.org Artificial IntelligenceNov-5-2020

This paper investigates a hybrid solution which combines deep reinforcement learning (RL) and classical trajectory planning for the following in front application. Here, an autonomous robot aims to stay ahead of a person as the person freely walks around. Following in front is a challenging problem as the user's intended trajectory is unknown and needs to be estimated, explicitly or implicitly, by the robot. In addition, the robot needs to find a feasible way to safely navigate ahead of human trajectory. Our deep RL module implicitly estimates human trajectory and produces short-term navigational goals to guide the robot. These goals are used by a trajectory planner to smoothly navigate the robot to the short-term goals, and eventually in front of the user. We employ curriculum learning in the deep RL module to efficiently achieve a high return. Our system outperforms the state-of-the-art in following ahead and is more reliable compared to end-to-end alternatives in both the simulation and real world experiments. In contrast to a pure deep RL approach, we demonstrate zero-shot transfer of the trained policy from simulation to the real world.

experiment, robot, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2011.03125

Country: North America > Canada (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning

Boney, Rinu, Sainio, Jussi, Kaivola, Mikko, Solin, Arno, Kannala, Juho

arXiv.org Artificial IntelligenceNov-5-2020

Abstract-- Current robot platforms available for research are either very expensive or unable to handle the abuse of exploratory controls in reinforcement learning. We develop RealAnt, a minimal low-cost physical version of the popular'Ant' benchmark used in reinforcement learning. RealAnt costs only 350 AC ($410) in materials and can be assembled in less than an hour. We validate the platform with reinforcement learning experiments and provide baseline results on a set of benchmark tasks. We demonstrate that the TD3 algorithm can learn to walk the RealAnt from less than 45 minutes of experience. We also provide simulator versions of the robot (with the same dimensions, state-action spaces, and delayed noisy observations) in the MuJoCo and PyBullet simulators.

algorithm, realant, robot, (15 more...)

arXiv.org Artificial Intelligence

2011.03085

Country:

Europe > Finland (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Hu, Yujing, Wang, Weixun, Jia, Hangtian, Wang, Yixiang, Chen, Yingfeng, Hao, Jianye, Wu, Feng, Fan, Changjie

arXiv.org Artificial IntelligenceNov-5-2020

Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly propose three learning algorithms based on different assumptions. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards, and meanwhile ignore unbeneficial shaping rewards or even transform them into beneficial ones.

algorithm, reward function, training step, (15 more...)

arXiv.org Artificial Intelligence

2011.02669

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Deep Reactive Planning in Dynamic Environments

Ota, Kei, Jha, Devesh K., Onishi, Tadashi, Kanezaki, Asako, Yoshiyasu, Yusuke, Sasaki, Yoko, Mariyama, Toshisada, Nikovski, Daniel

arXiv.org Artificial IntelligenceNov-5-2020

The main novelty of the proposed approach is that it allows a robot to learn an end-to-end policy which can adapt to changes in the environment during execution. While goal conditioning of policies has been studied in the RL literature, such approaches are not easily extended to cases where the robot's goal can change during execution. This is something that humans are naturally able to do. However, it is difficult for robots to learn such reflexes (i.e., to naturally respond to dynamic environments), especially when the goal location is not explicitly provided to the robot, and instead needs to be perceived through a vision sensor. In the current work, we present a method that can achieve such behavior by combining traditional kinematic planning, deep learning, and deep reinforcement learning in a synergistic fashion to generalize to arbitrary environments. We demonstrate the proposed approach for several reaching and pick-and-place tasks in simulation, as well as on a real system of a 6-DoF industrial manipulator. A video describing our work could be found \url{https://youtu.be/hE-Ew59GRPQ}.

agent, robot, waypoint, (16 more...)

arXiv.org Artificial Intelligence

2011.00155

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

How to Make Sense of the Reinforcement Learning Agents? - KDnuggets

#artificialintelligenceNov-4-2020, 22:45:08 GMT

Based on simply watching how an agent acts in the environment it is hard to tell anything about why it behaves this way and how it works internally. That's why it is crucial to establish metrics that tell WHY the agent performs in a certain way. This is challenging especially when the agent doesn't behave the way we would like it to behave, … which is like always. Every AI practitioner knows that whatever we work on, most of the time it won't simply work out of the box (they wouldn't pay us so much for it otherwise). In this blog post, you'll learn what to keep track of to inspect/debug your agent learning trajectory. I'll assume you are already familiar with the Reinforcement Learning (RL) agent-environment setting (see Figure 1) and you've heard about at least some of the most common RL algorithms and environments. Nevertheless, don't worry if you are just beginning your journey with RL.

agent, episode return, experiment, (16 more...)

#artificialintelligence

Country:

Europe > Poland > Pomerania Province > Gdańsk (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Google, OpenAI & DeepMind: Shared Task Behaviour Priors Can Boost RL and Generalization

#artificialintelligenceNov-4-2020, 17:30:46 GMT

Researchers in recent years have deployed reinforcement learning (RL) agents to solve increasingly challenging problems. As the trend continues, so has the development of new methods that enable the injection of "priors" (prior knowledge) into agents to help them better understand the structure of the world and come up with more effective solution strategies. In a new paper, researchers from Google, OpenAI, and DeepMind introduce "behaviour priors," a framework designed to capture common movement and interaction patterns that are shared across a set of related tasks or contexts. The researchers discuss how such behaviour patterns can be captured using probabilistic trajectory models and how they can be integrated effectively into RL schemes, such as for facilitating multi-task and transfer learning. Their method for learning behaviour priors can lead to significant speedups on complex tasks, the researchers say.

large language model, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.58)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Add feedback

Deploying reinforcement learning in production using Ray and Amazon SageMaker

#artificialintelligenceNov-4-2020, 17:25:24 GMT

Reinforcement learning (RL) is used to automate decision-making in a variety of domains, including games, autoscaling, finance, robotics, recommendations, and supply chain. Launched at AWS re:Invent 2018, Amazon SageMaker RL helps you quickly build, train, and deploy policies learned by RL. Ray is an open-source distributed execution framework that makes it easy to scale your Python applications. Amazon SageMaker RL uses the RLlib library that builds on the Ray framework to train RL policies. This post walks you through the tools available in Ray and Amazon SageMaker RL that help you address challenges such as scale, security, iterative development, and operational cost when you use RL in production.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Adaptive Stress Testing of Trajectory Predictions in Flight Management Systems

Moss, Robert J., Lee, Ritchie, Visser, Nicholas, Hochwarth, Joachim, Lopez, James G., Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceNov-4-2020

To find failure events and their likelihoods in flight-critical systems, we investigate the use of an advanced black-box stress testing approach called adaptive stress testing. We analyze a trajectory predictor from a developmental commercial flight management system which takes as input a collection of lateral waypoints and en-route environmental conditions. Our aim is to search for failure events relating to inconsistencies in the predicted lateral trajectories. The intention of this work is to find likely failures and report them back to the developers so they can address and potentially resolve shortcomings of the system before deployment. To improve search performance, this work extends the adaptive stress testing formulation to be applied more generally to sequential decision-making problems with episodic reward by collecting the state transitions during the search and evaluating at the end of the simulated rollout. We use a modified Monte Carlo tree search algorithm with progressive widening as our adversarial reinforcement learner. The performance is compared to direct Monte Carlo simulations and to the cross-entropy method as an alternative importance sampling baseline. The goal is to find potential problems otherwise not found by traditional requirements-based testing. Results indicate that our adaptive stress testing approach finds more failures and finds failures with higher likelihood relative to the baseline approaches.

failure event, miss distance, waypoint, (14 more...)

arXiv.org Artificial Intelligence

2011.02559

Country:

North America > United States > Michigan > Kent County > Grand Rapids (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Gangwani, Tanmay, Peng, Jian, Zhou, Yuan

arXiv.org Machine LearningNov-4-2020

The goal in Reinforcement Learning (RL) is to learn agents that maximize long-term environmental rewards. Deep RL, which uses deep neural networks as function approximators for the policy and value-functions, has achieved outstanding results on a wide variety of sequential decision making problems, with the barometer of success usually being the total returns accumulated by the final policy. Due to the intrinsic nature of direct reward maximization, seldom is the focus on how the behavioral characteristics of the trained agent compare with the other possible behaviors in the solution space. For instance, consider the robotic manipulator arm in Figure 1a and the peg-insertion task. Though the task description is simple, for a sufficiently flexible arm, there are numerous ways (positions of the joints and the end-effector) to insert the peg in the hole (Figure 1b).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2011.02614

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback