AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning

Swamy, Gokul, Choudhury, Sanjiban, Wu, Zhiwei Steven, Bagnell, J. Andrew

arXiv.org Machine LearningMar-4-2021

We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between learner and expert behavior, we are able to derive bounds on policy performance that apply for all algorithms in each of these classes, the first to our knowledge. We also introduce the notion of recoverability, implicit in many previous analyses of imitation learning, which allows us to cleanly delineate how well each algorithmic family is able to mitigate compounding errors. We derive two novel algorithm templates, AdVIL and AdRIL, with strong guarantees, simple implementation, and competitive empirical performance.

algorithm, learner, learning, (13 more...)

arXiv.org Machine Learning

2103.03236

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Inverse Reinforcement Learning with Explicit Policy Estimates

Sanghvi, Navyata, Usami, Shinnosuke, Sharma, Mohit, Groeger, Joachim, Kitani, Kris

arXiv.org Machine LearningMar-4-2021

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.

approximation-based method, gradient, mce-irl, (13 more...)

arXiv.org Machine Learning

2103.02863

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Routing algorithms as tools for integrating social distancing with emergency evacuation

Tsai, Yi-Lin, Rastogi, Chetanya, Kitanidis, Peter K., Field, Christopher B.

arXiv.org Artificial IntelligenceMar-4-2021

In this study, we explore the implications of integrating social distancing with emergency evacuation when a hurricane approaches a major city during the COVID-19 pandemic. Specifically, we compare DNN (Deep Neural Network)-based and non-DNN methods for generating evacuation strategies that minimize evacuation time while allowing for social distancing in rescue vehicles. A central question is whether a DNN-based method provides sufficient extra efficiency to accommodate social distancing, in a time-constrained evacuation operation. We describe the problem as a Capacitated Vehicle Routing Problem and solve it using one non-DNN solution (Sweep Algorithm) and one DNN-based solution (Deep Reinforcement Learning). DNN-based solution can provide decision-makers with more efficient routing than non-DNN solution. Although DNN-based solution can save considerable time in evacuation routing, it does not come close to compensating for the extra time required for social distancing and its advantage disappears as the vehicle capacity approaches the number of people per household.

dnn-based solution, non-dnn solution, social distancing, (13 more...)

arXiv.org Artificial Intelligence

2103.03413

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.06)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > South Carolina (0.04)
(9 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning to Schedule DAG Tasks

Hua, Zhigang, Qi, Feng, Liu, Gan, Yang, Shuang

arXiv.org Artificial IntelligenceMar-4-2021

Scheduling computational tasks represented by directed acyclic graphs (DAGs) is challenging because of its complexity. Conventional scheduling algorithms rely heavily on simple heuristics such as shortest job first (SJF) and critical path (CP), and are often lacking in scheduling quality. In this paper, we present a novel learning-based approach to scheduling DAG tasks. The algorithm employs a reinforcement learning agent to iteratively add directed edges to the DAG, one at a time, to enforce ordering (i.e., priorities of execution and resource allocation) of "tricky" job nodes. By doing so, the original DAG scheduling problem is dramatically reduced to a much simpler proxy problem, on which heuristic scheduling algorithms such as SJF and CP can be efficiently improved. Our approach can be easily applied to any existing heuristic scheduling algorithms. On the benchmark dataset of TPC-H, we show that our learning based approach can significantly improve over popular heuristic algorithms and consistently achieves the best performance among several methods under a variety of settings.

algorithm, makespan, node, (16 more...)

arXiv.org Artificial Intelligence

2103.03412

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
Europe > France (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Efficient UAV Trajectory-Planning using Economic Reinforcement Learning

Khalil, Alvi Ataur, Byrne, Alexander J, Rahman, Mohammad Ashiqur, Manshaei, Mohammad Hossein

arXiv.org Artificial IntelligenceMar-3-2021

Advances in unmanned aerial vehicle (UAV) design have opened up applications as varied as surveillance, firefighting, cellular networks, and delivery applications. Additionally, due to decreases in cost, systems employing fleets of UAVs have become popular. The uniqueness of UAVs in systems creates a novel set of trajectory or path planning and coordination problems. Environments include many more points of interest (POIs) than UAVs, with obstacles and no-fly zones. This system revolves around an economic theory, in particular an auction mechanism where UAVs trade assigned POIs. We formulate the path planning problem as a multi-agent economic game, where agents can cooperate and compete for resources. We then translate the problem into a Partially Observable Markov decision process (POMDP), which is solved using a reinforcement learning (RL) model deployed on each agent. As the system computes task distributions via UAV cooperation, it is highly resilient to any change in the swarm size. Our proposed network and economic game architecture can effectively coordinate the swarm as an emergent phenomenon while maintaining the swarm's operation. Unmanned aerial vehicles (UAVs) are applicable to a wide-ranging set of problems such as fire fighting, security monitoring, agriculture, edge computing, 3D mapping, and network support [1]. Fire fighting problems center around tracking and finding fires, whereas security applications focus on monitoring and finding targets. On the other hand, agricultural problems center around field monitoring and data harvesting, while edge computing and network support are focused on data harvesting and load reaction. All of these problems can be abstracted to a set of partially observed points and must be traveled to in the shortest amount of time possible, and then some task must be carried out in the vicinity of this point. Swarm surveillance missions are essential in both civilian and military contexts, where solutions must be secure, reliable, and efficient.

agent, q-learning, uav, (16 more...)

arXiv.org Artificial Intelligence

2103.02676

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > Canada > Alberta (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games (0.93)
Law Enforcement & Public Safety > Fire & Emergency Services (0.74)
Aerospace & Defense > Aircraft (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Self-play Learning Strategies for Resource Assignment in Open-RAN Networks

Wang, Xiaoyang, Thomas, Jonathan D, Piechocki, Robert J, Kapoor, Shipra, Santos-Rodriguez, Raul, Parekh, Arjun

arXiv.org Artificial IntelligenceMar-3-2021

Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.

assignment, requirement, resource assignment, (13 more...)

arXiv.org Artificial Intelligence

2103.02649

Country: Europe > United Kingdom > England > Bristol (0.05)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Telecommunications (1.00)
Information Technology (1.00)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
(2 more...)

Add feedback

Reinforcement Learning with External Knowledge by using Logical Neural Networks

Kimura, Daiki, Chaudhury, Subhajit, Wachi, Akifumi, Kohita, Ryosuke, Munawar, Asim, Tatsubori, Michiaki, Gray, Alexander

arXiv.org Artificial IntelligenceMar-3-2021

Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. The LNNs functions as an end-to-end differentiable network that minimizes a novel contradiction loss to learn interpretable rules. In this paper, we utilize LNNs to define an inference graph using basic logical operations, such as AND and NOT, for faster convergence in reinforcement learning. Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide. Our results empirically demonstrate that our method converges faster compared to a model-free reinforcement learning method that doesn't have such logical constraints.

lnn-shielding, reinforcement, west room, (12 more...)

arXiv.org Artificial Intelligence

2103.02363

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Visualizing MuZero Models

de Vries, Joery A., Voskuil, Ken S., Moerland, Thomas M., Plaat, Aske

arXiv.org Artificial IntelligenceMar-3-2021

MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.

latent space, muzero, regularization, (16 more...)

arXiv.org Artificial Intelligence

2102.12924

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Applied Reinforcement Learning with Python PDF

#artificialintelligenceMar-2-2021, 06:29:28 GMT

Delve into the world of reinforcement learning algorithms and apply them to different use-cases via Python. This book covers important topics such as policy gradients and Q learning and utilizes frameworks such as Tensorflow, Keras, and OpenAI Gym. Applied Reinforcement Learning with Python introduces you to the theory behind reinforcement learning (RL) algorithms and the code that will be used to implement them. You will take a guided tour through the features of OpenAI Gym, from utilizing standard libraries to creating your own environments, then discover how to frame reinforcement learning problems so you can research, develop, and deploy RL-based solutions.

applied reinforcement learning, openai gym

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)

Add feedback

Automating Trading and Market Making With Artificial Intelligence

#artificialintelligenceMar-2-2021, 05:25:43 GMT

The goal is to capture information in a market's order books and use that information to predict market movement/direction. That prediction can enable repricing of orders and more efficient market making. Such an approach allows the market maker to provide liquidity whilst making profits at the same time. Market makers are essential to modern markets. They provide the markets with necessary liquidity and make sure the bid/ask spread is reasonably narrow to allow efficient purchasing.

information, market maker, order book, (13 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback