AITopics

1906.05838

Country: North America > United States > California (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lee, Lisa, Eysenbach, Benjamin, Parisotto, Emilio, Xing, Eric, Levine, Sergey, Salakhutdinov, Ruslan

Efficient Exploration via State Marginal Matching

To solve tasks with sparse rewards, reinforcement learning algorithms must be equipped with suitable exploration techniques. However, it is unclear what underlying objective is being optimized by existing exploration algorithms, or how they can be altered to incorporate prior knowledge about the task. Most importantly, it is difficult to use exploration experience from one task to acquire exploration strategies for another task. We address these shortcomings by learning a single exploration policy that can quickly solve a suite of downstream tasks in a multi-task setting, amortizing the cost of learning to explore. We recast exploration as a problem of State Marginal Matching (SMM): we learn a mixture of policies for which the state marginal distribution matches a given target state distribution, which can incorporate prior knowledge about the task. Without any prior knowledge, the SMM objective reduces to maximizing the marginal state entropy. We optimize the objective by reducing it to a two-player, zero-sum game, where we iteratively fit a state density model and then update the policy to visit states with low density under this model. While many previous algorithms for exploration employ a similar procedure, they omit a crucial historical averaging step, without which the iterative procedure does not converge to a Nash equilibria. To parallelize exploration, we extend our algorithm to use mixtures of policies, wherein we discover connections between SMM and previously-proposed skill learning methods based on mutual information. On complex navigation and manipulation tasks, we demonstrate that our algorithm explores faster and adapts more quickly to new tasks.

exploration, game theory, upstream oil & gas, (17 more...)

1906.05274

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hierarchical Decision Making by Generating and Following Natural Language Instructions

Hu, Hengyuan, Yarats, Denis, Gong, Qucheng, Tian, Yuandong, Lewis, Mike

We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales. We gather a dataset of 76 thousand pairs of instructions and executions from human play, and train instructor and executor models. Experiments show that models using natural language as a latent variable significantly outperform models that directly imitate human actions. The compositional structure of language proves crucial to its effectiveness for action representation. We also release our code, models and data.

machine learning, natural language, reinforcement learning, (20 more...)

1906.00744

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Overview (0.93)
Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Li, Changjian, Czarnecki, Krzysztof

A Micro-Objective Perspective of Reinforcement Learning

arXiv.org Machine LearningJun-12-2019

The standard reinforcement learning (RL) formulation considers the expectation of the (discounted) cumulative reward. This is limiting in applications where we are concerned with not only the expected performance, but also the distribution of the performance. In this paper, we introduce micro-objective reinforcement learning --- an alternative RL formalism that overcomes this issue. In this new formulation, a RL task is specified by a set of micro-objectives, which are constructs that specify the desirability or undesirability of events. In addition, micro-objectives allow prior knowledge in the form of temporal abstraction to be incorporated into the global RL objective. The generality of this formalism, and its relations to single/multi-objective RL, and hierarchical RL are discussed.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1905.10016

Country:

North America > United States > Colorado > Denver County > Denver (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Transportation (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Chebotar, Yevgen, Molchanov, Artem, Bechtle, Sarah, Righetti, Ludovic, Meier, Franziska, Sukhatme, Gaurav

Meta-Learning via Learned Loss

We present a meta-learning approach based on learning an adaptive, high-dimensional loss function that can generalize across multiple tasks and different model architectures. We develop a fully differentiable pipeline for learning a loss function targeted at maximizing the performance of an optimizee trained using this loss function. We observe that the loss landscape produced by our learned loss significantly improves upon the original task-specific loss. We evaluate our method on supervised and reinforcement learning tasks. Furthermore, we show that our pipeline is able to operate in sparse reward and self-supervised reinforcement learning scenarios.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1906.05374

Country:

North America > United States > California (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Asadulaev, Arip, Stein, Gideon, Kuznetsov, Igor, Filchenkov, Andrey

Jacobian Policy Optimizations

arXiv.org Machine LearningJun-12-2019

Recently, natural policy gradient algorithms gained widespread recognition due to their strong performance in reinforcement learning tasks [12, 13]. However, their major drawback is the need to secure the policy being in a "trust region" and meanwhile allowing for sufficient exploration. The main objective of this study was to present an approach which models dynamical isometry of agents policies by estimating conditioning of its Jacobian at individual points in the environment space. We present a Jacobian Policy Optimization algorithm for policy optimization, which dynamically adapts the trust interval with respect to policy conditioning. The suggested approach was tested across a range of Atari environments. This paper offers some important insights into an improvement of policy optimization in reinforcement learning tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1906.05437

Country:

Asia > Russia (0.05)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-12-2019

Reinforcement Learning of Spatio-Temporal Point Processes

Zhu, Shixiang, Li, Shuang, Xie, Yao

Spatio-temporal event data is ubiquitous in various applications, such as social media, crime events, and electronic health records. Spatio-temporal point processes offer a versatile framework for modeling such event data, as it can jointly capture spatial and temporal dependency. A key question is to estimate the generative model for such point processes, which enables the subsequent machine learning tasks. Existing works mainly focus on parametric models for the conditional intensity function, such as the widely used multi-dimensional Hawkes processes. However, parametric models tend to lack flexibility in tackling real data. On the other hand, non-parametric for spatio-temporal point processes tend to be less interpretable. We introduce a novel and flexible semi-parametric spatial-temporal point processes model, by combining spatial statistical models based on heterogeneous Gaussian mixture diffusion kernels, whose parameters are represented using neural networks. We learn the model using a reinforcement learning framework, where the reward function is defined via the maximum mean discrepancy (MMD) of the empirical processes generated by the model and the real data. Experiments based on real data show the superior performance of our method relative to the state-of-the-art.

machine learning, point process, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1906.05467

Country: North America > United States > New York (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.54)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Zhu, Ming, Liu, Xiao-Yang, Wang, Xiaodong

Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks

Unmanned aerial vehicles (UAVs) are envisioned to complement the 5G communication infrastructure in future smart cities. Hot spots easily appear in road intersections, where effective communication among vehicles is challenging. UAVs may serve as relays with the advantages of low price, easy deployment, line-of-sight links, and flexible mobility. In this paper, we study a UAV-assisted vehicular network where the UAV jointly adjusts its transmission power and bandwidth allocation under 3D flight to maximize the total throughput. First, we formulate a Markov Decision Process (MDP) problem by modeling the mobility of vehicles and the state transitions caused by the UAV's 3D flight. Secondly, we solve the target problem using a deep reinforcement learning method, namely, the deep deterministic policy gradient, and propose three solutions with different control objectives. Thirdly, in a simplified model with small state and action spaces, we verify the optimality of proposed algorithms. Comparing with two baseline schemes, we demonstrate the effectiveness of proposed algorithms in a realistic model.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1906.05015

Country: Asia > China (0.29)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Telecommunications (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Jurgenson, Tom, Groshev, Edward, Tamar, Aviv

Sub-Goal Trees -- a Framework for Goal-Directed Trajectory Prediction and Optimization

Many AI problems, in robotics and other domains, are goal-directed, essentially seeking a trajectory leading to some goal state. In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization. Interestingly, most all prior work in imitation and reinforcement learning builds on a sequential trajectory representation -- calculating the next state in the trajectory given its predecessors. We propose a different perspective: a goal-conditioned trajectory can be represented by first selecting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this representation a sub-goal tree, and building on it, we develop new methods for trajectory prediction, learning, and optimization. We show that in a supervised learning setting, sub-goal trees better account for trajectory variability, and can predict trajectories exponentially faster at test time by leveraging a concurrent computation. Then, for optimization, we derive a new dynamic programming equation for sub-goal trees, and use it to develop new planning and reinforcement learning algorithms. These algorithms, which are not based on the standard Bellman equation, naturally account for hierarchical sub-goal structure in a task. Empirical results on motion planning domains show that the sub-goal tree framework significantly improves both accuracy and prediction time.

machine learning, reinforcement learning, trajectory, (16 more...)

1906.05329

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Eysenbach, Benjamin, Salakhutdinov, Ruslan, Levine, Sergey

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and the relative values of states, but fails to plan over long horizons. Despite the successes of each method in various domains, tasks that require reasoning over long horizons with limited feedback and high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid reward shaping, which can bias the agent towards finding a sub-optimal solution. We introduce a general control algorithm that combines the strengths of planning and reinforcement learning to effectively solve these tasks. Our aim is to decompose the task of reaching a distant goal state into a sequence of easier tasks, each of which corresponds to reaching a subgoal. Planning algorithms can automatically find these waypoints, but only if provided with suitable abstractions of the environment -- namely, a graph consisting of nodes and edges. Our main insight is that this graph can be constructed via reinforcement learning, where a goal-conditioned value function provides edge weights, and nodes are taken to be previously seen observations in a replay buffer. Using graph search over our replay buffer, we can automatically generate this sequence of subgoals, even in image-based environments. Our algorithm, search on the replay buffer (SoRB), enables agents to solve sparse reward tasks over one hundred steps, and generalizes substantially better than standard RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1906.05253

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)