AITopics

1901.01977

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Portugal > Braga > Braga (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Osa, Takayuki, Tangkaratt, Voot, Sugiyama, Masashi

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

arXiv.org Machine LearningJan-4-2019

Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy gradient method. This framework is derived by leveraging the analogy between a monolithic policy in standard RL and a hierarchical policy in HRL by using a deterministic option policy. Experimental results indicate that our HRL approach can learn a diversity of options and that it can enhance the performance of RL in continuous control tasks.

adinfohrl, advantage function, option policy, (15 more...)

1901.01365

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Choudhury*, Rohan, Swamy*, Gokul, Hadfield-Menell, Dylan, Dragan, Anca

On the Utility of Model Learning in HRI

arXiv.org Machine LearningJan-4-2019

Abstract--Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a "theory of mind" to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot's theory of mind for the human are wrong, as they inevitably will be in practice. We find that there is a significant sample complexity advantage to theory of mind methods and that they are more robust to covariate shift, but that when enough interaction data is available, black box approaches eventually dominate. An age-old debate that still animates the halls of computer science, robotics, neuroscience, and psychology departments alike is that between model-based and model-free (reinforcement) learning. Model-based methods work by building a model of the world - the dynamics that tells an agent how the world state will change as a consequence of its actions - and optimizing a cost or reward function under the learned model. In contrast, model-free methods never attempt to explicitly learn how the world works. Instead, the agent learns a policy directly from acting in the world and learning from what works and what does not. Model-free methods are appealing because the agent implicitly learns what it needs to know about the world, and only what it needs. Model-based methods are appealing because knowing how the world works might enable the agent to generalize beyond its experience, and possibly be able to explain why a decision is the best one. In neuro-and cognitive science, the debate is about which paradigm best describes human learning [1], [2].

assumption, model-based method, robot, (15 more...)

1901.01291

Country:

North America > United States > Oregon (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Transportation (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJan-4-2019

Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning

Ceren, Roi

Optimal decision making with limited or no information in stochastic environments where multiple agents interact is a challenging topic in the realm of artificial intelligence. Reinforcement learning (RL) is a popular approach for arriving at optimal strategies by predicating stimuli, such as the reward for following a strategy, on experience. RL is heavily explored in the single-agent context, but is a nascent concept in multiagent problems. To this end, I propose several principled model-free and partially model-based reinforcement learning approaches for several multiagent settings. In the realm of normative reinforcement learning, I introduce scalable extensions to Monte Carlo exploring starts for partially observable Markov Decision Processes (POMDP), dubbed MCES-P, where I expand the theory and algorithm to the multiagent setting. I first examine MCES-P with probably approximately correct (PAC) bounds in the context of multiagent setting, showing MCESP+PAC holds in the presence of other agents. I then propose a more sample-efficient methodology for antagonistic settings, MCESIP+PAC. For cooperative settings, I extend MCES-P to the Multiagent POMDP, dubbed MCESMP+PAC. I then explore the use of reinforcement learning as a methodology in searching for optima in realistic and latent model environments. First, I explore a parameterized Q-learning approach in modeling humans learning to reason in an uncertain, multiagent environment. Next, I propose an implementation of MCES-P, along with image segmentation, to create an adaptive team-based reinforcement learning technique to positively identify the presence of phenotypically-expressed water and pathogen stress in crop fields.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1901.01325

Country: North America > United States > Georgia > Clarke County > Athens (0.27)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dhiman, Vikas, Banerjee, Shurjo, Griffin, Brent, Siskind, Jeffrey M, Corso, Jason J

A Critical Investigation of Deep Reinforcement Learning for Navigation

arXiv.org Artificial IntelligenceJan-4-2019

The navigation problem is classically approached in two steps: an exploration step, where map-information about the environment is gathered; and an exploitation step, where this information is used to navigate efficiently. Deep reinforcement learning (DRL) algorithms, alternatively, approach the problem of navigation in an end-to-end fashion. Inspired by the classical approach, we ask whether DRL algorithms are able to inherently explore, gather and exploit map-information over the course of navigation. We build upon Mirowski et al. [2017] work and introduce a systematic suite of experiments that vary three parameters: the agent's starting location, the agent's target location, and the maze structure. We choose evaluation metrics that explicitly measure the algorithm's ability to gather and exploit map-information. Our experiments show that when trained and tested on the same maps, the algorithm successfully gathers and exploits map-information. However, when trained and tested on different sets of maps, the algorithm fails to transfer the ability to gather and exploit map-information to unseen maps. Furthermore, we find that when the goal location is randomized and the map is kept static, the algorithm is able to gather and exploit map-information but the exploitation is far from optimal. We open-source our experimental suite in the hopes that it serves as a framework for the comparison of future algorithms and leads to the discovery of robust alternatives to classical navigation methods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1802.02274

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Bera, Krishn, Savalia, Tejas, Raju, Bapi

A Computational Framework for Motor Skill Acquisition

arXiv.org Artificial IntelligenceJan-3-2019

There have been numerous attempts in explaining the general learning behaviours by various cognitive models. Multiple hypotheses have been put further to qualitatively argue the best-fit model for motor skill acquisition task and its variations. In this context, for a discrete sequence production (DSP) task, one of the most insightful models is Verwey's Dual Processor Model (DPM). It largely explains the learning and behavioural phenomenon of skilled discrete key-press sequences without providing any concrete computational basis of reinforcement. Therefore, we propose a quantitative explanation for Verwey's DPM hypothesis by experimentally establishing a general computational framework for motor skill learning. We attempt combining the qualitative and quantitative theories based on a best-fit model of the experimental simulations of variations of dual processor models. The fundamental premise of sequential decision making for skill learning is based on interacting model-based (MB) and model-free (MF) reinforcement learning (RL) processes. Our unifying framework shows the proposed idea agrees well to Verwey's DPM and Fitts' three phases of skill learning. The accuracy of our model can further be validated by its statistical fit with the human-generated data on simple environment tasks like the grid-world.

machine learning, reinforcement, reinforcement learning, (15 more...)

1901.01856

Country: North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.72)
Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Porav, Horia, Newman, Paul

Imminent Collision Mitigation with Reinforcement Learning and Vision

arXiv.org Machine LearningJan-3-2019

Abstract-- This work examines the role of reinforcement learning in reducing the severity of on-road collisions by controlling velocity and steering in situations in which contact is imminent. We construct a model, given camera images as input, that is capable of learning and predicting the dynamics of obstacles, cars and pedestrians, and train our policy using this model. Two policies that control both braking and steering are compared against a baseline where the only action taken is (conventional) braking in a straight line. The two policies are trained using two distinct reward structures, one where any and all collisions incur a fixed penalty, and a second one where the penalty is calculated based on already established delta-v models of injury severity. The results show that both policies exceed the performance of the baseline, with the policy trained using injury models having the highest performance.

baseline, collision, vehicle, (16 more...)

1901.00898

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Machine LearningJan-3-2019

Adversarial Learning of a Sampler Based on an Unnormalized Distribution

Li, Chunyuan, Bai, Ke, Li, Jianqiao, Wang, Guoyin, Chen, Changyou, Carin, Lawrence

We investigate adversarial learning in the case when only an unnormalized form of the density can be accessed, rather than samples. With insights so garnered, adversarial learning is extended to the case for which one has access to an unnormalized form u(x) of the target density function, but no samples. Further, new concepts in GAN regularization are developed, based on learning from samples or from u(x). The proposed method is compared to alternative approaches, with encouraging results demonstrated across a range of applications, including deep soft Q-learning.

adversarial learning, learning, regularization, (12 more...)

1901.00612

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Clayton, Nicholas R., Abbass, Hussein

Machine Teaching in Hierarchical Genetic Reinforcement Learning: Curriculum Design of Reward Functions for Swarm Shepherding

arXiv.org Artificial IntelligenceJan-3-2019

The design of reward functions in reinforcement learning is a human skill that comes with experience. Unfortunately, there is not any methodology in the literature that could guide a human to design the reward function or to allow a human to transfer the skills developed in designing reward functions to another human and in a systematic manner. In this paper, we use Systematic Instructional Design, an approach in human education, to engineer a machine education methodology to design reward functions for reinforcement learning. We demonstrate the methodology in designing a hierarchical genetic reinforcement learner that adopts a neural network representation to evolve a swarm controller for an agent shepherding a boids-based swarm. The results reveal that the methodology is able to guide the design of hierarchical reinforcement learners, with each model in the hierarchy learning incrementally through a multi-part reward function. The hierarchy acts as a decision fusion function that combines the individual behaviours and skills learnt by each instruction to create a smart shepherd to control the swarm.

evolutionary algorithm, machine learning, reinforcement learning, (15 more...)

1901.00949

Country: Oceania > Australia (0.28)

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Florensa, Carlos, Degrave, Jonas, Heess, Nicolas, Springenberg, Jost Tobias, Riedmiller, Martin

Self-supervised Learning of Image Embedding for Continuous Control

arXiv.org Artificial IntelligenceJan-3-2019

Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches assume the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy and representations tend to be task specific and may not transfer well. In this work we investigate completely self-supervised learning of a general image embedding and control primitives, based on finding the shortest time to reach any state. We also introduce a new structure for the state-action value function that builds a connection between model-free and model-based methods, and improves the performance of the learning algorithm. We experimentally demonstrate these findings in three simulated robotic tasks.

machine learning, reinforcement learning, trajectory, (15 more...)

1901.00943

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)