Goto

Collaborating Authors

 Reinforcement Learning


What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes

arXiv.org Artificial Intelligence

We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning. Rather, the information needed for the explanations must be collected in conjunction with training the agent. We derive approaches designed to extract local explanations based on intention for several variants of Q-function approximation and prove consistency between the explanations and the Q-values learned. We demonstrate our method on multiple reinforcement learning problems, and provide code to help researchers introspecting their RL environments and algorithms.


Model-based Reinforcement Learning from Signal Temporal Logic Specifications

arXiv.org Artificial Intelligence

Techniques based on Reinforcement Learning (RL) are increasingly being used to design control policies for robotic systems. RL fundamentally relies on state-based reward functions to encode desired behavior of the robot and bad reward functions are prone to exploitation by the learning agent, leading to behavior that is undesirable in the best case and critically dangerous in the worst. On the other hand, designing good reward functions for complex tasks is a challenging problem. In this paper, we propose expressing desired high-level robot behavior using a formal specification language known as Signal Temporal Logic (STL) as an alternative to reward/cost functions. We use STL specifications in conjunction with model-based learning to design model predictive controllers that try to optimize the satisfaction of the STL specification over a finite time horizon. The proposed algorithm is empirically evaluated on simulations of robotic system such as a pick-and-place robotic arm, and adaptive cruise control for autonomous vehicles.


Playing optical tweezers with deep reinforcement learning: in virtual, physical and augmented environments

arXiv.org Artificial Intelligence

Reinforcement learning was carried out in a simulated environment to learn continuous velocity control over multiple motor axes. This was then applied to a real-world optical tweezers experiment with the objective of moving a laser-trapped microsphere to a target location whilst avoiding collisions with other free-moving microspheres. The concept of training a neural network in a virtual environment has significant potential in the application of machine learning for experimental optimization and control, as the neural network can discover optimal methods for problem solving without the risk of damage to equipment, and at a speed not limited by movement in the physical environment. As the neural network treats both virtual and physical environments equivalently, we show that the network can also be applied to an augmented environment, where a virtual environment is combined with the physical environment. This technique may have the potential to unlock capabilities associated with mixed and augmented reality, such as enforcing safety limits for machine motion or as a method of inputting observations from additional sensors.


Combining Propositional Logic Based Decision Diagrams with Decision Making in Urban Systems

arXiv.org Artificial Intelligence

Solving multiagent problems can be an uphill task due to uncertainty in the environment, partial observability, and scalability of the problem at hand. Especially in an urban setting, there are more challenges since we also need to maintain safety for all users while minimizing congestion of the agents as well as their travel times. To this end, we tackle the problem of multiagent pathfinding under uncertainty and partial observability where the agents are tasked to move from their starting points to ending points while also satisfying some constraints, e.g., low congestion, and model it as a multiagent reinforcement learning problem. We compile the domain constraints using propositional logic and integrate them with the RL algorithms to enable fast simulation for RL.


Adversarial Training for Code Retrieval with Question-Description Relevance Regularization

arXiv.org Artificial Intelligence

Code retrieval is a key task aiming to match natural and programming languages. In this work, we propose adversarial learning for code retrieval, that is regularized by question-description relevance. First, we adapt a simple adversarial learning technique to generate difficult code snippets given the input question, which can help the learning of code retrieval that faces bi-modal and data-scarce challenges. Second, we propose to leverage question-description relevance to regularize adversarial learning, such that a generated code snippet should contribute more to the code retrieval training loss, only if its paired natural language description is predicted to be less relevant to the user given question. Experiments on large-scale code retrieval datasets of two programming languages show that our adversarial learning method is able to improve the performance of state-of-the-art models. Moreover, using an additional duplicate question prediction model to regularize adversarial learning further improves the performance, and this is more effective than using the duplicated questions in strong multi-task learning baselines


D2RL: Deep Dense Architectures in Reinforcement Learning

#artificialintelligence

In this article I want to give a quick presentation of the D2RL paper, applying deep dense architectures of neural networks for Deep Reinforcement Learning. The effect of large and dense network architectures has been long explored in Computer Vision and Deep Learning. The improved performance and other benefits of such dense models compared to shallow ones are widely established. Thereby, in Deep Reinforcement Learning, neural network architectures haven't gotten that much attention yet. Commonly used networks like policy or Q-function are usually only two layers deep.


The Power of Offline Reinforcement Learning

#artificialintelligence

Reinforcement learning has grown rapidly in the past few years, from tabular methods that can only solve simple toy problems to powerful algorithms that tackle incredibly complex problems such as playing Go, learning robotic manipulation skills or controlling autonomous vehicles. Unfortunately, adoption of RL for real-world applications has been somewhat slow, and while current RL methods have proven their ability to find high performing policies for challenging problems with high-dimensional raw observations (such as images), actually using them is often difficult or impractical. This is in stark contrast to supervised learning methods, which are highly prevalent in many fields of industry and research and are utilized with great success. Most RL research papers and implementations are geared towards the online learning setting, in which the agent interacts with an environment and gathers data, using its current policy and some exploration scheme to explore the state-action space and find higher-reward areas. Such online RL algorithms interact with the environment and use the gathered experience either immediately or via some replay buffer to update the policy.


AlphaZero, a novel Reinforcement Learning Algorithm, deployed in JavaScript

#artificialintelligence

In this blog post, you will learn about and implement AlphaZero, an exciting and novel Reinforcement Learning Algorithm, used to beat world-champions in games like Go and Chess. You will use it to master a pen-and-pencil game (Dots and Boxes) and deploy it into a web app, entirely in JavaScript. AlphaZero's key and most exciting aspect is its ability to gain superhuman behavior in board games without relying on external knowledge. AlphaZero learns to master the game by playing against itself (self-play) and learning from those experiences. We will leverage a "simplified, highly flexible, commented, and easy to understand implementation" Python version of AlphaZero from Surag Nair available in Github. You can go ahead and play the game here. The WebApp and JavaScript implementation are available here. This code was ported from this Python implementation.


Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states. From a theoretical perspective, however, RL with functional approximation poses a fundamental challenge to developing algorithms with provable computational and statistical efficiency, due to the need to take into consideration both the exploration-exploitation tradeoff that is inherent in RL and the bias-variance tradeoff that is innate in statistical estimation. To address such a challenge, focusing on the episodic setting where the action-value functions are represented by a kernel function or over-parametrized neural network, we propose the first provable RL algorithm with both polynomial runtime and sample complexity, without additional assumptions on the data-generating model. In particular, for both the kernel and neural settings, we prove that an optimistic modification of the least-squares value iteration algorithm incurs an $\tilde{\mathcal{O}}(\delta_{\mathcal{F}} H^2 \sqrt{T})$ regret, where $\delta_{\mathcal{F}}$ characterizes the intrinsic complexity of the function class $\mathcal{F}$, $H$ is the length of each episode, and $T$ is the total number of episodes. Our regret bounds are independent of the number of states and therefore even allows it to diverge, which exhibits the benefit of function approximation.


Detecting and adapting to crisis pattern with context based Deep Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning (DRL) has reached super human levels in complex tasks like game solving (Go and autonomous driving). However, it remains an open question whether DRL can reach human level in applications to financial problems and in particular in detecting pattern crisis and consequently dis-investing. In this paper, we present an innovative DRL framework consisting in two sub-networks fed respectively with portfolio strategies past performances and standard deviations as well as additional contextual features. The second sub network plays an important role as it captures dependencies with common financial indicators features like risk aversion, economic surprise index and correlations between assets that allows taking into account context based information. We compare different network architectures either using layers of convolutions to reduce network's complexity or LSTM block to capture time dependency and whether previous allocations is important in the modeling. We also use adversarial training to make the final model more robust. Results on test set show this approach substantially over-performs traditional portfolio optimization methods like Markowitz and is able to detect and anticipate crisis like the current Covid one.