AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Bias-reduced multi-step hindsight experience replay

Yang, Rui, Lyu, Jiafei, Yang, Yu, Ya, Jiangpeng, Luo, Feng, Luo, Dijun, Li, Lanqing, Li, Xiu

arXiv.org Artificial IntelligenceFeb-25-2021

Multi-goal reinforcement learning is widely used in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges with hindsight knowledge. However, HER and its previous variants still need millions of samples and a huge computation. In this paper, we propose \emph{Multi-step Hindsight Experience Replay} (MHER) based on $n$-step relabeling, incorporating multi-step relabeled returns to improve sample efficiency. Despite the advantages of $n$-step relabeling, we theoretically and experimentally prove the off-policy $n$-step bias introduced by $n$-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER($\lambda$) and Model-based MHER (MMHER) are presented. MHER($\lambda$) exploits the $\lambda$ return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy $n$-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

mher, off-policy -step bias, transition, (11 more...)

arXiv.org Artificial Intelligence

2102.12962

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Positive Reinforcements Help Algorithm Forecast Underground Natural Reserves

#artificialintelligenceFeb-24-2021, 18:40:19 GMT

Texas A&M University and University of Oklahoma researchers have designed a reinforcement-based algorithm that automates the prediction of underground oil and gas reserves. Texas A&M University (TAMU) and University of Oklahoma researchers have developed a reinforcement-based algorithm that automates forecasting of subterranean properties, enabling accurate prediction of oil and gas reserves. The algorithm focuses on the correct characterization of the underground environment based on rewards accumulated for making correct predictions of pressure and flow anticipated from boreholes. The TAMU team learned that within 10 iterations of reinforcement learning, the algorithm could correctly and rapidly predict the properties of simple subsurface scenarios. TAMU's Siddharth Misra said, "We have turned history matching into a sequential decision-making problem, which has the potential to reduce engineers' efforts, mitigate human bias, and remove the need of large sets of labeled training data."

algorithm forecast underground natural reserve, artificial intelligence, upstream oil & gas, (8 more...)

#artificialintelligence

Country:

North America > United States > Texas (0.63)
North America > United States > Oklahoma (0.56)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

3 ways to get into reinforcement learning

#artificialintelligenceFeb-24-2021, 15:10:42 GMT

When I was in graduate school in the 1990s, one of my favorite classes was neural networks. Back then, we didn't have access to TensorFlow, PyTorch, or Keras; we programmed neurons, neural networks, and learning algorithms by hand with the formulas from textbooks. We didn't have access to cloud computing, and we coded sequential experiments that often ran overnight. There weren't platforms like Alteryx, Dataiku, SageMaker, or SAS to enable a machine learning proof of concept or manage the end-to-end MLops lifecycles. I was most interested in reinforcement learning algorithms, and I recall writing hundreds of reward functions to stabilize an inverted pendulum.

algorithm, reinforcement, reward function, (8 more...)

#artificialintelligence

Genre: Instructional Material (0.32)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.31)
Education > Educational Setting > Online (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Multi-Agent Deep Reinforcement Learning in 13 Lines of Code Using PettingZoo

#artificialintelligenceFeb-24-2021, 08:15:13 GMT

This tutorial provides a simple introduction to using multi-agent reinforcement learning, assuming a little experience in machine learning and knowledge of Python. Reinforcement stems from using machine learning to optimally control an agent in an environment. It works by learning a policy, a function that maps an observation obtained from its environment to an action. Policy functions are typically deep neural networks, which gives rise to the name "deep reinforcement learning." The goal of reinforcement learning is to learn an optimal policy, a policy that achieves the maximum expected reward from the environment when acting.

learning, reinforcement, reinforcement learning, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.91)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

How reinforcement learning chooses the ads you see

#artificialintelligenceFeb-24-2021, 03:10:10 GMT

Every day, digital advertisement agencies serve billions of ads on news websites, search engines, social media networks, video streaming websites, and other platforms. And they all want to answer the same question: Which of the many ads they have in their catalog is more likely to appeal to a certain viewer? Finding the right answer to this question can have a huge impact on revenue when you are dealing with hundreds of websites, thousands of ads, and millions of visitors. Fortunately (for the ad agencies, at least), reinforcement learning (RL), the branch of artificial intelligence that has become renowned for mastering board and video games, provides a solution. Reinforcement learning models seek to maximize rewards.

agent, impression, reinforcement, (13 more...)

#artificialintelligence

Country:

North America (0.05)
Europe (0.05)
Asia > East Asia (0.04)
Africa (0.04)

Industry:

Marketing (0.49)
Leisure & Entertainment (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Sufficient Statistic for Influence in Structured Multiagent Environments

Oliehoek, Frans (Delft University of Technology) | Witwicki, Stefan (Nissan) | Kaelbling, Leslie (MIT)

Journal of Artificial Intelligence ResearchFeb-24-2021

Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computational intractability of principled solution methods. A body of work in AI has tried to mitigate this problem by trying to distill interaction to its essence: how does the policy of one agent influence another agent? If we can find more compact representations of such influence, this can help us deal with the complexity, for instance by searching the space of influences rather than the space of policies. However, so far these notions of influence have been restricted in their applicability to special cases of interaction. In this paper we formalize influence-based abstraction (IBA), which facilitates the elimination of latent state factors without any loss in value, for a very general class of problems described as factored partially observable stochastic games (fPOSGs). On the one hand, this generalizes existing descriptions of influence, and thus can serve as the foundation for improvements in scalability and other insights in decision making in complex multiagent settings. On the other hand, since the presence of other agents can be seen as a generalization of single agent settings, our formulation of IBA also provides a sufficient statistic for decision making under abstraction for a single agent. We also give a detailed discussion of the relations to such previous works, identifying new insights and interpretations of these approaches. In these ways, this paper deepens our understanding of abstraction in a wide range of sequential decision making settings, providing the basis for new approaches and algorithms for a large class of problems.

abstraction, agent, oliehoek, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12136

AI Access Foundation

12136

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(6 more...)

Genre:

Research Report (0.45)
Overview (0.45)

Industry:

Leisure & Entertainment > Games (0.67)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(6 more...)

Add feedback

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

Borges, Alexandre, Oliveira, Arlindo

arXiv.org Artificial IntelligenceFeb-24-2021

The combination of deep learning and Monte Carlo Tree Search (MCTS) has shown to be effective in various domains, such as board and video games. AlphaGo represented a significant step forward in our ability to learn complex board games, and it was rapidly followed by significant advances, such as AlphaGo Zero and AlphaZero. Recently, MuZero demonstrated that it is possible to master both Atari games and board games by directly learning a model of the environment, which is then used with MCTS to decide what move to play in each position. During tree search, the algorithm simulates games by exploring several possible moves and then picks the action that corresponds to the most promising trajectory. When training, limited use is made of these simulated games since none of their trajectories are directly used as training examples. Even if we consider that not all trajectories from simulated games are useful, there are thousands of potentially useful trajectories that are discarded. Using information from these trajectories would provide more training data, more quickly, leading to faster convergence and higher sample efficiency. Recent work introduced an off-policy value target for AlphaZero that uses data from simulated games. In this work, we propose a way to obtain off-policy targets using data from simulated games in MuZero. We combine these off-policy targets with the on-policy targets already used in MuZero in several ways, and study the impact of these targets and their combinations in three environments with distinct characteristics. When used in the right combinations, our results show that these targets speed up the training process and lead to faster convergence and higher rewards than the ones obtained by MuZero.

muzero, simulation, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2102.12194

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Portugal > Lisbon > Lisbon (0.05)

Genre: Research Report > New Finding (0.54)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Games > Chess (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Hybrid Car-Following Strategy based on Deep Deterministic Policy Gradient and Cooperative Adaptive Cruise Control

Yan, Ruidong, Jiang, Rui, Jia, Bin, Yang, Diange, Huang, Jin

arXiv.org Artificial IntelligenceFeb-24-2021

Deep deterministic policy gradient (DDPG) based car-following strategy can break through the constraints of the differential equation model due to the ability of exploration on complex environments. However, the car-following performance of DDPG is usually degraded by unreasonable reward function design, insufficient training and low sampling efficiency. In order to solve this kind of problem, a hybrid car-following strategy based on DDPG and cooperative adaptive cruise control (CACC) is proposed. Firstly, the car-following process is modeled as markov decision process to calculate CACC and DDPG simultaneously at each frame. Given a current state, two actions are obtained from CACC and DDPG, respectively. Then an optimal action, corresponding to the one offering a larger reward, is chosen as the output of the hybrid strategy. Meanwhile, a rule is designed to ensure that the change rate of acceleration is smaller than the desired value. Therefore, the proposed strategy not only guarantees the basic performance of car-following through CACC, but also makes full use of the advantages of exploration on complex environments via DDPG. Finally, simulation results show that the car-following performance of proposed strategy is improved significantly as compared with that of DDPG and CACC in the whole state space.

cacc, ddpg, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2103.03796

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.36)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Lazic, Nevena, Yin, Dong, Abbasi-Yadkori, Yasin, Szepesvari, Csaba

arXiv.org Machine LearningFeb-24-2021

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation. Our result provides the first high-probability $O(\sqrt{T})$ regret bound for a computationally efficient algorithm in this setting. The exact implementation of Politex with neural network function approximation is inefficient in terms of memory and computation. Since our analysis suggests that we need to approximate the average of the action-value functions of past policies well, we propose a simple efficient implementation where we train a single Q-function on a replay buffer with past data. We show that this often leads to superior performance over other implementation choices, especially in terms of wall-clock time. Our work also provides a novel theoretical justification for using experience replay within policy iteration algorithms.

algorithm, experience replay, implementation, (12 more...)

arXiv.org Machine Learning

2102.12611

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{T}$ Regret

Cassel, Asaf, Koren, Tomer

arXiv.org Machine LearningFeb-24-2021

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

assumption, learning, probability, (10 more...)

arXiv.org Machine Learning

2102.12608

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback