Goto

Collaborating Authors

 Reinforcement Learning


Discounted Reinforcement Learning is Not an Optimization Problem

arXiv.org Artificial Intelligence

Discounted reinforcement learning is fundamentally incom patible with function approximation for control in continuing tasks. This is beca use it is not an optimization problem -- it lacks an objective function. After s ubstantiating these claims, we go on to address some misconceptions about discou nting and its connection to the average reward formulation. W e encourage res earchers to adopt rigorous optimization approaches for reinforcement learn ing in continuing tasks, such as average reward.


Manufacturing Dispatching using Reinforcement and Transfer Learning

arXiv.org Artificial Intelligence

Efficient dispatching rule in manufacturing industry is key to ensure product on-time delivery and minimum past-due and inventory cost. Manufacturing, especially in the developed world, is moving towards on-demand manufacturing meaning a high mix, low volume product mix. This requires efficient dispatching that can work in dynamic and stochastic environments, meaning it allows for quick response to new orders received and can work over a disparate set of shop floor settings. In this paper we address this problem of dispatching in manufacturing. Using reinforcement learning (RL), we propose a new design to formulate the shop floor state as a 2-D matrix, incorporate job slack time into state representation, and design lateness and tardiness rewards function for dispatching purpose. However, maintaining a separate RL model for each production line on a manufacturing shop floor is costly and often infeasible. To address this, we enhance our deep RL model with an approach for dispatching policy transfer. This increases policy generalization and saves time and cost for model training and data collection. Experiments show that: (1) our approach performs the best in terms of total discounted reward and average lateness, tardiness, (2) the proposed policy transfer approach reduces training time and increases policy generalization.


Zero Shot Learning on Simulated Robots

arXiv.org Artificial Intelligence

In this work we present a method for leveraging data from one source to learn how to do multiple new tasks. Task transfer is achieved using a self-model that encapsulates the dynamics of a system and serves as an environment for reinforcement learning. To study this approach, we train a self-models on various robot morphologies, using randomly sampled actions. Using a self-model, an initial state and corresponding actions, we can predict the next state. This predictive self-model is then used by a standard reinforcement learning algorithm to accomplish tasks without ever seeing a state from the "real" environment. These trained policies allow the robots to successfully achieve their goals in the "real" environment. We demonstrate that not only is training on the self-model far more data efficient than learning even a single task, but also that it allows for learning new tasks without necessitating any additional data collection, essentially allowing zero-shot learning of new tasks.


Quantized Reinforcement Learning (QUARL)

arXiv.org Artificial Intelligence

Recent work has shown that quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision making process in reinforcement learning remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the effects of quantization on various deep reinforcement learning policies with the intent to reduce their computational resource demands. We apply techniques such as post-training quantization and quantization aware training to a spectrum of reinforcement learning tasks (such as Pong, Breakout, BeamRider and more) and training algorithms (such as PPO, A2C, DDPG, and DQN). Across this spectrum of tasks and learning algorithms, we show that policies can be quantized to 6-8 bits of precision without loss of accuracy. We also show that certain tasks and reinforcement learning algorithms yield policies that are more difficult to quantize due to their effect of widening the models' distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline. Finally, we demonstrate real-world applications of quantization for reinforcement learning. We use half-precision training to train a Pong model 50% faster, and we deploy a quantized reinforcement learning based navigation policy to an embedded system, achieving an 18$\times$ speedup and a 4$\times$ reduction in memory usage over an unquantized policy.


Building smart robots using AI ROS: Part 1

#artificialintelligence

The Robot Operating System (ROS) is a flexible framework for writing robot software. It is a collection of tools, libraries and conventions that aim to simplify the task of creating complex and robust robot behavior across a wide variety of robotic platforms. ROS is used to create application for a physical robot without depending on the actual machine, thus saving cost and time. These applications can be transferred onto the physical robot without modifications. The decision making capability of the robots can be aided with AI.


Deep Reinforcement Learning for Logistics at Instadeep w/ Karim Beguir

#artificialintelligence

Today we are joined by Karim Beguir, Co-Founder and CEO of InstaDeep. InstaDeep, based in Tunisia, Africa, is focused on building advanced decision-making systems for the enterprise. Karim's goal is to show that advanced AI and Deep Learning is taking place in Africa, solving real-world problems and building a new generation of talent in the AI industry. With offices around the world, InstaDeep works with large companies in multiple industries with this episode focusing on logistical challenges, like ride-sharing and container shipping. These problems require decision-making in complex environments with a large number of choices.


Path-planning microswimmers can swim efficiently in turbulent flows

arXiv.org Machine Learning

We develop an adversarial-reinforcement learning scheme for microswimmers in statistically homogeneous and isotropic turbulent fluid flows, in both two (2D) and three dimensions (3D). We show that this scheme allows microswimmers to find non-trivial paths, which enable them to reach a target on average in less time than a na\"ive microswimmer, which tries, at any instant of time and at a given position in space, to swim in the direction of the target. We use pseudospectral direct numerical simulations (DNSs) of the 2D and 3D (incompressible) Navier-Stokes equations to obtain the turbulent flows. We then introduce passive microswimmers that try to swim along a given direction in these flows; the microswimmwers do not affect the flow, but they are advected by it. Two, non-dimensional, control parameters play important roles in our learning scheme: (a) the ratio $\tilde{V}_s$ of the microswimmer's bare velocity $V_s$ and the root-mean-square (rms) velocity $u_{rms}$ of the turbulent fluid; and (b) the product $\tilde{B}$ of the microswimmer-response time $B$ and the rms vorticity $\omega_{rms}$ of the fluid. We show that, in a substantial part of the $\tilde{V}_s-\tilde{B}$ plane, the average time required for the microswimmers to reach the target, by using our adversarial-learning scheme, eventually reduces below the average time taken by microswimmers that follow the na\"ive strategy.


Learning Robust Representations with Graph Denoising Policy Network

arXiv.org Machine Learning

--Graph representation learning, aiming to learn low-dimensional representations which capture the geometric dependencies between nodes in the original graph, has gained increasing popularity in a variety of graph analysis tasks, including node classification and link prediction. Existing representation learning methods based on graph neural networks and their variants rely on the aggregation of neighborhood information, which makes it sensitive to noises in the graph, e.g. In this paper, we propose Graph Denoising Policy Network (short for GDPNet) to learn robust representations from noisy graph data through reinforcement learning. GDPNet first selects signal neighborhoods for each node, and then aggregates the information from the selected neighborhoods to learn node representations for the downstream tasks. Specifically, in the signal neighborhood selection phase, GDPNet optimizes the neighborhood for each target node by formulating the process of removing noisy neighborhoods as a Markov decision process and learning a policy with task-specific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for downstream tasks, and provides task-specific rewards to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative task-specific rewards, and to learn robust representations for nodes. Note that GDPNet is naturally an inductive model which can leverage both graph structure and the associated node feature information to efficiently generate representations for unseen nodes. Experimental results on node classification task demonstrate the effectiveness of GDNet, outperforming the state-of-the-art graph representation learning methods on several well-studied datasets. Additionally, we show that, with a carefully designed reward function, GDPNet is mathematically equivalent to solving the submodular maximizing problem, which theoretically guarantees the best approximation to the optimal solution with GDPNet.


Causal Induction from Visual Observations for Goal Directed Tasks

arXiv.org Artificial Intelligence

Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world. In this work, we propose to endow an artificial agent with the capability of causal reasoning for completing goal-directed tasks. We develop learning-based approaches to inducing causal knowledge in the form of directed acyclic graphs, which can be used to contextualize a learned goal-conditional policy to perform tasks in novel environments with latent causal structures. We leverage attention mechanisms in our causal induction model and goal-conditional policy, enabling us to incrementally generate the causal graph from the agent's visual observations and to selectively use the induced graph for determining actions. Our experiments show that our method effectively generalizes towards completing new tasks in novel environments with previously unseen causal structures.


Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

arXiv.org Artificial Intelligence

A BSTRACT In the multi-objective reinforcement learning (MORL) paradigm, the relative importance of each environment objective is often unknown prior to training, so agents must learn to specialize their behavior to optimize different combinations of environment objectives that are specified post-training. These are typically linear combinations, so the agent is effectively parameterized by a weight vector that describes how to balance competing environment objectives. However, many real world behaviors require nonlinear combinations of objectives. Additionally, the conversion between desired behavior and weightings is often unclear. In this work, we explore the use of a language based on propositional logic with quantitative semantics-in place of weight vectors-for specifying nonlinear behaviors in an interpretable way. We use a recurrent encoder to encode logical combinations of objectives, and train a MORL agent to generalize over these encodings. We test our agent in several grid worlds with various objectives and show that our agent can generalize to many never-before-seen specifications with performance comparable to single policy baseline agents. We also demonstrate our agent's ability to generate meaningful policies when presented with novel specifications and quickly specialize to novel specifications. 1 I NTRODUCTION Reinforcement Learning (RL) is a method for learning behavior policies by maximizing expected reward through interactions with an environment. RL has grown in popularity as RL agents have excelled at increasingly complex tasks, including board games (Silver et al., 2016), video games (Mnih et al., 2015), robotic control (Haarnoja et al., 2018), and other high dimensional, complex tasks.