AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Intelligent Roundabout Insertion using Deep Reinforcement Learning

Capasso, Alessandro Paolo, Bacchiani, Giulio, Molinari, Daniele

arXiv.org Artificial IntelligenceJan-3-2020

The study and development of autonomous vehicles have seen an increasing interest in recent years, becoming hot topics in both academia and industry. One of the main reasearch areas in this field is related to control systems, in particular planning and decision-making problems. The basic approaches for scheduling high-level maneuver execution modules are based on the concepts of time-to-collision (van der Horst and Hogema, 1994) and headway control (Hatipoglu et al., 1996). In order to add interpretation capabilities to the system, several approaches model the driving decision-making problem as a Partially Observable Markov Decision Process (POMDP, (Spaan, 2012)), as in (Liu et al., 2015) for urban scenarios and in (Song et al., 2016) for intersection handling. A further extension is proposed in (Bandyopadhyay et al., 2012) where a Mixed Observability Markov Decision Process (MOMDP) (Ong et al., 2010) is used to model uncertainties in agents intentions. However, since vehicles are assumed to behave in a deterministic way, the aforementioned approaches handle many situations with excessive prudence and would not be able to enter in a busy roundabout.

agent, roundabout, vehicle, (13 more...)

arXiv.org Artificial Intelligence

2001.00786

Country:

Europe > Italy (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)

Add feedback

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

Neunert, Michael, Abdolmaleki, Abbas, Wulfmeier, Markus, Lampe, Thomas, Springenberg, Jost Tobias, Hafner, Roland, Romano, Francesco, Buchli, Jonas, Heess, Nicolas, Riedmiller, Martin

arXiv.org Machine LearningJan-2-2020

Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2001.00449

Country:

Europe > United Kingdom (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas (0.67)
Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Chen, Letian, Paleja, Rohan, Ghuy, Muyleng, Gombolay, Matthew

arXiv.org Machine LearningJan-2-2020

Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert's demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans' strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy's objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.

reward function, task reward, trajectory, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3319502.3374791

2001.00503

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Education (0.93)
Leisure & Entertainment > Sports > Tennis (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reasoning on Knowledge Graphs with Debate Dynamics

Hildebrandt, Marcel, Serna, Jorge Andres Quintero, Ma, Yunpu, Ringsquandl, Martin, Joblin, Mitchell, Tresp, Volker

arXiv.org Machine LearningJan-2-2020

We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.

agent, argument, classification, (16 more...)

arXiv.org Machine Learning

2001.00461

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > United Kingdom > England (0.04)
Europe > Germany (0.04)
Asia > Tajikistan (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Regional Government (0.68)
Leisure & Entertainment > Sports > Baseball (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

2019 in Review: 10 AI Papers That Made an Impact

#artificialintelligenceJan-1-2020, 03:01:32 GMT

The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year's 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total. As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero).

machine learning, natural language, reinforcement learning, (20 more...)

#artificialintelligence

Country:

North America > Canada > Ontario > Toronto (0.30)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.34)

Add feedback

Options of Interest: Temporal Abstraction with Interest Functions

Khetarpal, Khimya, Klissarov, Martin, Chevalier-Boisvert, Maxime, Bacon, Pierre-Luc, Precup, Doina

arXiv.org Machine LearningJan-1-2020

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.

agent, interest function, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2001.00271

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

Sohn, Sungryull, Woo, Hyunjae, Choi, Jongwook, Lee, Honglak

arXiv.org Machine LearningJan-1-2020

We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph which describes a set of subtasks and their dependencies that are unknown to the agent. The agent needs to quickly adapt to the task over few episodes during adaptation phase to maximize the return in the test phase. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference(MSGI), which infers the latent parameter of the task by interacting with the environment and maximizes the return given the latent parameter. To facilitate learning, we adopt an intrinsic reward inspired by upper confidence bound (UCB) that encourages efficient exploration. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter, and to adapt more efficiently than existing meta RL and hierarchical RL methods.

agent, subtask, subtask graph, (14 more...)

arXiv.org Machine Learning

2001.00248

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Humble Book Bundle: Python & Machine Learning by Packt

#artificialintelligenceDec-31-2019, 18:31:47 GMT

Whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments, our latest ebook bundles from Packt is perfect for you! Get titles like Python Machine Learning, Reinforcement Learning Algorithms with Python, and Machine Learning Projects for Mobile Applications. Plus, your purchase will support Innocent Lives Foundation! Normally, the total cost for the ebooks in this bundle is as much as $1,051. Here at Humble Bundle, you choose the price and increase your contribution to upgrade your bundle! This bundle has a minimum $1 purchase.

humble book bundle, packt, python & machine learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Parisi, Simone, Tateo, Davide, Hensel, Maximilian, D'Eramo, Carlo, Peters, Jan, Pajarinen, Joni

arXiv.org Machine LearningDec-31-2019

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free. We further propose new tabular environments for benchmarking exploration in reinforcement learning. Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function. Results also suggest that our approach scales gracefully with the size of the environment. Source code is available at https://github.com/sparisi/visit-value-explore

exploration, survey article, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

2001.00119

Country:

Europe > Germany (0.14)
Europe > Finland (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Oil & Gas > Upstream (0.66)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Towards Neural-Guided Program Synthesis for Linear Temporal Logic Specifications

Camacho, Alberto, McIlraith, Sheila A.

arXiv.org Artificial IntelligenceDec-31-2019

Synthesizing a program that realizes a logical specification is a classical problem in computer science. We examine a particular type of program synthesis, where the objective is to synthesize a strategy that reacts to a potentially adversarial environment while ensuring that all executions satisfy a Linear Temporal Logic (LTL) specification. Unfortunately, exact methods to solve so-called LTL synthesis via logical inference do not scale. In this work, we cast LTL synthesis as an optimization problem. We employ a neural network to learn a Q-function that is then used to guide search, and to construct programs that are subsequently verified for correctness. Our method is unique in combining search with deep learning to realize LTL synthesis. In our experiments the learned Q-function provides effective guidance for synthesis problems with relatively small specifications.

logic & formal reasoning, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1912.1343

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback