Goto

Collaborating Authors

 Reinforcement Learning


General solutions for nonlinear differential equations: a deep reinforcement learning approach

arXiv.org Machine Learning

Physicists use differential equations to describe the physical dynamical world, and the solutions of these equations constitute our understanding of the world. During the hundreds of years, scientists developed several ways to solve these equations, i.e., the analytical solutions and the numerical solutions. However, for some complex equations, there may be no analytical solutions, and the numerical solutions may encounter the curse of the extreme computational cost if the accuracy is the first consideration. Solving equations is a high-level human intelligence work and a crucial step towards general artificial intelligence (AI), where deep reinforcement learning (DRL) may contribute. This work makes the first attempt of applying (DRL) to solve nonlinear differential equations both in discretized and continuous format with the governing equations (physical laws) embedded in the DRL network, including ordinary differential equations (ODEs) and partial differential equations (PDEs). The DRL network consists of an actor that outputs solution approximations policy and a critic that outputs the critic of the actor's output solution. Deterministic policy network is employed as the actor, and governing equations are embedded in the critic. The effectiveness of the DRL solver in Schr\"odinger equation, Navier-Stocks, Van der Pol equation, Burgers' equation and the equation of motion are discussed.


Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

arXiv.org Artificial Intelligence

Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through time (BPTT), but this technique requires storing observation traces long enough to span the interval between cause and effect. Besides memory demands, learning dynamics like vanishing gradients and slow convergence due to infrequent weight updates can reduce BPTT's practicality; meanwhile, although online recurrent network learning is a developing topic, most approaches are not efficient enough to use as replacements. We propose a simple, effective memory strategy that can extend the window over which BPTT can learn without requiring longer traces. We explore this approach empirically on a few tasks and discuss its implications.


Generating Rescheduling Knowledge using Reinforcement Learning in a Cognitive Architecture

arXiv.org Artificial Intelligence

In order to reach higher degrees of flexibility, adaptability and autonomy in manufacturing systems, it is essential to develop new rescheduling methodologies which resort to cognitive capabilities, similar to those found in human beings. Artificial cognition is important for designing planning and control systems that generate and represent knowledge about heuristics for repair-based scheduling. Rescheduling knowledge in the form of decision rules is used to deal with unforeseen events and disturbances reactively in real time, and take advantage of the ability to act interactively with the user to counteract the effects of disruptions. In this work, to achieve the aforementioned goals, a novel approach to generate rescheduling knowledge in the form of dynamic first-order logical rules is proposed. The proposed approach is based on the integration of reinforcement learning with artificial cognitive capabilities involving perception and reasoning/learning skills embedded in the Soar cognitive architecture. An industrial example is discussed showing that the approach enables the scheduling system to assess its operational range in an autonomic way, and to acquire experience through intensive simulation while performing repair tasks.


Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

arXiv.org Artificial Intelligence

With the increase of machine learning usage by industries and scientific communities in a variety of tasks such as text mining, image recognition and self-driving cars, automatic setting of hyper-parameter in learning algorithms is a key factor for achieving satisfactory performance regardless of user expertise in the inner workings of the techniques and methodologies. In particular, for a reinforcement learning algorithm, the efficiency of an agent learning a control policy in an uncertain environment is heavily dependent on the hyper-parameters used to balance exploration with exploitation. In this work, an autonomous learning framework that integrates Bayesian optimization with Gaussian process regression to optimize the hyper-parameters of a reinforcement learning algorithm, is proposed. Also, a bandits-based approach to achieve a balance between computational costs and decreasing uncertainty about the Q-values, is presented. A gridworld example is used to highlight how hyper-parameter configurations of a learning algorithm (SARSA) are iteratively improved based on two performance functions.


Adversarial Task Transfer from Preference

arXiv.org Machine Learning

Task transfer is extremely important for reinforcement learning, since it provides possibility for generalizing to new tasks. One main goal of task transfer in reinforcement learning is to transfer the action policy of an agent from the original basic task to specific target task. Existing work to address this challenging problem usually requires accurate hand-coded cost functions or rich demonstrations on the target task. This strong requirement is difficult, if not impossible, to be satisfied in many practical scenarios. In this work, we develop a novel task transfer framework which effectively performs the policy transfer using preference only. The hidden cost model for preference and adversarial training are elegantly combined to perform the task transfer. We give the theoretical analysis on the convergence about the proposed algorithm, and perform extensive simulations on some well-known examples to validate the theoretical results.


Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human/Agent's Demonstration

arXiv.org Artificial Intelligence

Reinforcement learning has enjoyed multiple successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper introduces a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper uses demonstrations from agents and humans, allowing an untrained agent to quickly achieve high performance. We empirically compare with, and highlight the weakness of, HAT and CHAT, methods of transferring knowledge from a source agent/human to a target agent. This paper introduces an effective transfer approach, DRoP, combining the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP dynamically involves the demonstrator's knowledge, integrating it into the reinforcement learning agent's online learning loop to achieve efficient and robust learning.


Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

arXiv.org Artificial Intelligence

Abstract--In recent years, reinforcement learning has achieved many remarkable successes due to the growing adoption of deep learning techniques and the rapid growth in computing power . Nevertheless, it is well-known that flat reinforcement learning algorithms are often not able to learn well and data-efficient in tasks having hierarchical structures, e.g. Hierarchical reinforcement learning is a principled approach that is able to tackle these challenging tasks. On the other hand, many real-world tasks usually have only partial observability in which state measurements are often imperfect and partially observable. The problems of RL in such settings can be formulated as a partially observable Markov decision process (POMDP). In this paper, we study hierarchical RL in POMDP in which the tasks have only partial observability and possess hierarchical properties. We propose a hierarchical deep reinforcement learning approach for learning in hierarchical POMDP . The deep hierarchical RL algorithm is proposed to apply to both MDP and POMDP learning. We evaluate the proposed algorithm on various challenging hierarchical POMDP . Reinforcement Learning (RL) [1] is a subfield of machine learning focused on learning a policy in order to maximize total cumulative reward in an unknown environment. RL is divided into two approaches: value-based approach and policy-based approach [15]. A typical value-based approach tries to obtain an optimal policy by finding optimal value functions. The value functions are updated using the immediate reward and the discounted value of the next state. Some methods based on this approach are Q-learning, SARSA, and TD-learning [1]. In contrast, the policy-based approach directly learns a parameterized policy that maximizes the cumulative discounted reward.


Behavioral Cloning from Observation

arXiv.org Artificial Intelligence

Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.


Extracting Action Sequences from Texts Based on Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Extracting action sequences from natural language texts is challenging, as it requires commonsense inferences based on world knowledge. Although there has been work on extracting action scripts, instructions, navigation actions, etc., they require that either the set of candidate actions be provided in advance, or that action descriptions are restricted to a specific form, e.g., description templates. In this paper, we aim to extract action sequences from texts in free natural language, i.e., without any restricted templates, provided the candidate set of actions is unknown. We propose to extract action sequences from texts based on the deep reinforcement learning framework. Specifically, we view "selecting" or "eliminating" words from texts as "actions", and the texts associated with actions as "states". We then build Q-networks to learn the policy of extracting actions and extract plans from the labeled texts. We demonstrate the effectiveness of our approach on several datasets with comparison to state-of-the-art approaches, including online experiments interacting with humans.


Practical Reinforcement Learning Coursera

#artificialintelligence

About this course: Welcome to the Reinforcement Learning course. Here you will find out about: - foundations of RL methods: value/policy iteration, q-learning, policy gradient, etc. --- with math & batteries included - using deep neural networks for RL tasks --- also known as "the hype train" - state of the art RL algorithms --- and how to apply duct tape to them for practical problems.