AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Deep Ensemble Multi-Agent Reinforcement Learning Approach for Air Traffic Control

Ghosh, Supriyo, Laguna, Sean, Lim, Shiau Hong, Wynter, Laura, Poonawala, Hasan

arXiv.org Artificial IntelligenceApr-3-2020

Air traffic control is an example of a highly challenging operational problem that is readily amenable to human expertise augmentation via decision support technologies. In this paper, we propose a new intelligent decision making framework that leverages multi-agent reinforcement learning (MARL) to dynamically suggest adjustments of aircraft speeds in real-time. The goal of the system is to enhance the ability of an air traffic controller to provide effective guidance to aircraft to avoid air traffic congestion, near-miss situations, and to improve arrival timeliness. We develop a novel deep ensemble MARL method that can concisely capture the complexity of the air traffic control problem by learning to efficiently arbitrate between the decisions of a local kernel-based RL model and a wider-reaching deep MARL model. The proposed method is trained and evaluated on an open-source air traffic management simulator developed by Eurocontrol. Extensive empirical results on a real-world dataset including thousands of aircraft demonstrate the feasibility of using multi-agent RL for the problem of en-route air traffic control and show that our proposed deep ensemble MARL method significantly outperforms three state-of-the-art benchmark approaches.

agent, aircraft, learning, (15 more...)

arXiv.org Artificial Intelligence

2004.01387

Country:

Europe (0.04)
North America > United States (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

da Costa, Paulo R. de O., Rhuggenaath, Jason, Zhang, Yingqian, Akcay, Alp

arXiv.org Artificial IntelligenceApr-3-2020

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

learning 2-opt heuristic, node, representation, (13 more...)

arXiv.org Artificial Intelligence

2004.01608

Country: Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

r/artificial - Google DeepMind 'Agent 57' Beats Human Baselines Across Atari Games Suite

#artificialintelligenceApr-2-2020, 12:09:06 GMT

DeepMind's breakthroughs in recent years are well documented, and the UK AI company has repeatedly stressed that mastering Go, StarCraft, etc. were not ends in themselves but rather steps toward artificial general intelligence (AGI). DeepMind's latest achievement stays on path: Agent57 is the ultimate gamer, the first deep reinforcement learning (RL) agent to top human baseline scores on all games in the Atari57 test set.

atari game suite, beat human baseline, deepmind, (2 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Microsoft's Mahjong-winning AI could lead to sophisticated finance market prediction systems

#artificialintelligenceApr-2-2020, 03:13:42 GMT

#artificialintelligence

Industry: Media > News (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

Gros, Sebastien, Zanon, Mario, Bemporad, Alberto

arXiv.org Artificial IntelligenceApr-2-2020

For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of $Q$-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of $Q$-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques.

artificial intelligence, projection, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2004.00915

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Keramati, Ramtin, Brunskill, Emma

arXiv.org Artificial IntelligenceApr-2-2020

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.

algorithm, observation space, optimal policy, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3320435.3320471

2004.01223

Country:

North America > United States > California > Santa Clara County > Stanford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Middle East > Cyprus > Larnaka > Larnaca (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Mao, Weichao, Zhang, Kaiqing, Miehling, Erik, Başar, Tamer

arXiv.org Artificial IntelligenceApr-2-2020

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose three natural embeddings, based on finite-memory truncation, principal component analysis, and recurrent neural networks. The output of these embeddings are then used as the information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.

agent, information, information state, (15 more...)

arXiv.org Artificial Intelligence

2004.01098

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.64)

Add feedback

Multi-agent Reinforcement Learning for Networked System Control

Chu, Tianshu, Chinchali, Sandeep, Katti, Sachin

arXiv.org Machine LearningApr-2-2020

This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance.

communication, neurcomm, scenario, (13 more...)

arXiv.org Machine Learning

2004.01339

Country:

Europe > Monaco (0.06)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (0.66)
Transportation > Ground > Road (0.66)
Consumer Products & Services > Travel (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to cooperate: Emergent communication in multi-agent navigation

Kajić, Ivana, Aygün, Eser, Precup, Doina

arXiv.org Machine LearningApr-2-2020

Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An analysis of the agents' policies reveals that emergent signals spatially cluster the state space, with signals referring to specific locations and spatial directions such as "left", "up", or "upper left room". Using populations of agents, we show that the emergent protocol has basic compositional structure, thus exhibiting a core property of natural language.

agent, receiver, sender, (16 more...)

arXiv.org Machine Learning

2004.01097

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Schneckenreither, Manuel

arXiv.org Machine LearningApr-2-2020

Although in recent years reinforcement learning has become very popular the number of successful applications to different kinds of operations research problems is rather scarce. Reinforcement learning is based on the well-studied dynamic programming technique and thus also aims at finding the best stationary policy for a given Markov Decision Process, but in contrast does not require any model knowledge. The policy is assessed solely on consecutive states (or state-action pairs), which are observed while an agent explores the solution space. The contributions of this paper are manifold. First we provide deep theoretical insights to the widely applied standard discounted reinforcement learning framework, which give rise to the understanding of why these algorithms are inappropriate when permanently provided with non-zero rewards, such as costs or profit. Second, we establish a novel near-Blackwell-optimal reinforcement learning algorithm. In contrary to former method it assesses the average reward per step separately and thus prevents the incautious combination of different types of state values. Thereby, the Laurent Series expansion of the discounted state values forms the foundation for this development and also provides the connection between the two approaches. Finally, we prove the viability of our algorithm on a challenging problem set, which includes a well-studied M/M/1 admission control queuing system. In contrast to standard discounted reinforcement learning our algorithm infers the optimal policy on all tested problems. The insights are that in the operations research domain machine learning techniques have to be adapted and advanced to successfully apply these methods in our settings.

algorithm, average reward, reinforcement, (14 more...)

arXiv.org Machine Learning

2004.00857

Country:

Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback