Goto

Collaborating Authors

 Reinforcement Learning


The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition

arXiv.org Artificial Intelligence

Recent advances in artificial intelligence have been strongly driven by the use of game environments for training and evaluating agents. Games are often accessible and versatile, with well-defined state-transitions and goals allowing for intensive training and experimentation. However, agents trained in a particular environment are usually tested on the same or slightly varied distributions, and solutions do not necessarily imply any understanding. If we want AI systems that can model and understand their environment, we need environments that explicitly test for this. Inspired by the extensive literature on animal cognition, we present an environment that keeps all the positive elements of standard gaming environments, but is explicitly designed for the testing of animal-like artificial cognition. All source-code is publicly available (see appendix).


Reinforcement Learning Tutorial with Open AI Gym

#artificialintelligence

The more I learn, the less I realize I know. This blog is the Part-2 of the series on reinforcement learning. Feel free to read the part-1 here. In this article I will be implementing OpenAI Gym's Bipedal Walker environment using Deep Deterministic Policy Gradient (DDPG) algorithm. OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.


Learning to Manipulate Object Collections Using Grounded State Representations

arXiv.org Artificial Intelligence

We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. First, we train a pair of encoders on raw object pose targets to learn representations that accurately capture the state information of a multi-object environment. Second, we use these encoders in a reinforcement learning algorithm to train image-based policies capable of manipulating many objects. Our pair of encoders consists of one which consumes RGB images and is used in our policy network, and one which directly consumes a set of raw object poses and is used for reward calculation and value estimation. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training.


A Review of Tracking, Prediction and Decision Making Methods for Autonomous Driving

arXiv.org Machine Learning

This literature review focuses on three important aspects of an autonomous car system: tracking (assessing the identity of the actors such as cars, pedestrians or obstacles in a sequence of observations), prediction (predicting the future motion of surrounding vehicles in order to navigate through various traffic scenarios) and decision making (analyzing the available actions of the ego car and their consequences to the entire driving context). For tracking and prediction, approaches based on (deep) neural networks and other, especially stochastic techniques, are reported. For decision making, deep reinforcement learning algorithms are presented, together with methods used to explore different alternative actions, such as Monte Carlo Tree Search.


Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment

arXiv.org Artificial Intelligence

Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment Ali Alizadeh 1, Majid Moghadam 2, Y unus Bicer 3, Nazim Kemal Ure 4, Ugur Y avas 5 and Can Kurtulus 5 Abstract -- Autonomous lane changing is a critical feature for advanced autonomous driving systems, that involves several challenges such as uncertainty in other driver's behaviors and the tradeoff between safety and agility. In this work, we develop a novel simulation environment that emulates these challenges and train a deep reinforcement learning agent that yields consistent performance in a variety of dynamic and uncertain traffic scenarios. Results show that the proposed data-driven approach performs significantly better in noisy environments compared to methods that rely solely on heuristics. I NTRODUCTION Advanced Driving Assistance Systems (ADAS) are developed to increase traffic safety by reducing the impact of human errors. The evolution of various levels of driving autonomy has seen a significant speedup in last years aiming to enhance comfort, safety, and driving experience. For a long time, with a limited amount of technological resources, automotive stakeholders were focusing on steady-state maneuvers to achieve driving autonomy.


!MDP Playground: Meta-Features in Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) algorithms usually assume their environment to be a Markov Decision Process (MDP). Additionally, they do not try to identify specific features of environments which could help them perform better. Here, we present a few key meta-features of environments: delayed rewards, specific reward sequences, sparsity of rewards, and stochasticity of environments, which may violate the MDP assumptions and adapting to which should help RL agents perform better. While it is very time consuming to run RL algorithms on standard benchmarks, we define a parameterised collection of fast-to-run toy benchmarks in OpenAI Gym by varying these meta-features. Despite their toy nature and low compute requirements, we show that these benchmarks present substantial difficulties to current RL algorithms. Furthermore, since we can generate environments with a desired value for each of the meta-features, we have fine-grained control over the environments' difficulty and also have the ground truth available for evaluating algorithms. We believe that devising algorithms that can detect such meta-features of environments and adapt to them will be key to creating robust RL algorithms that work in a variety of different real-world problems.


Petri Net Machines for Human-Agent Interaction

#artificialintelligence

Smart speakers and robots become ever more prevalent in our daily lives. These agents are able to execute a wide range of tasks and actions and, therefore, need systems to control their execution. Current state-of-the-art such as (deep) reinforcement learning, however, requires vast amounts of data for training which is often hard to come by when interacting with humans. To overcome this issue, most systems still rely on Finite State Machines. We introduce Petri Net Machines which present a formal definition for state machines based on Petri Nets that are able to execute concurrent actions reliably, execute and interleave several plans at the same time, and provide an easy to use modelling language.


BAFFLE : Blockchain based Aggregator Free Federated Learning

arXiv.org Machine Learning

A key aspect of Federated Learning (FL) is the requirement of a centralized aggregator to select and integrate models from various user devices. However, infeasibility of an aggregator due to a variety of operational constraints could prevent FL from being widely adopted. In this paper, we introduce BAFFLE, an aggregator free FL environment. Being powered by the blockchain, BAFFLE is inherently decentralized and successfully eliminates the constraints associated with an aggregator based FL framework. Our results indicate that BAFFLE provides superior performance while circumventing critical computational bottlenecks associated with the blockchain.


Learning Index Selection with Structured Action Spaces

arXiv.org Machine Learning

Configuration spaces for computer systems can be challenging for traditional and automatic tuning strategies. Injecting task-specific knowledge into the tuner for a task may allow for more efficient exploration of candidate configurations. We apply this idea to the task of index set selection to accelerate database workloads. Index set selection has been amenable to recent applications of vanilla deep RL, but real deployments remain out of reach. In this paper, we explore how learning index selection can be enhanced with task-specific inductive biases, specifically by encoding these inductive biases in better action structures. Index selection-specific action representations arise when the problem is reformulated in terms of permutation learning and we rely on recent work for learning RL policies on permutations. Through this approach, we build an indexing agent that is able to achieve improved indexing and validate its behavior with task-specific statistics. Early experiments reveal that our agent can find configurations that are up to 40% smaller for the same levels of latency as compared with other approaches and indicate more intuitive indexing behavior.


Deep Reinforcement Learning for Task-driven Discovery of Incomplete Networks

arXiv.org Machine Learning

Complex networks are often either too large for full exploration, partially accessible or partially observed. Downstream learning tasks on incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks and given resource collection constraints are of great interest. In this paper we formulate the task-specific network discovery problem in an incomplete network setting as a sequential decision making problem. Our downstream task is vertex classification.We propose a framework, called Network Actor Critic (NAC), which learns concepts of policy and reward in an offline setting via a deep reinforcement learning algorithm. A quantitative study is presented on several synthetic and real benchmarks. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms.