Reinforcement Learning
Long-Term Planning with Deep Reinforcement Learning on Autonomous Drones
Deep Learning methods are replacing traditional software methods in solving real-world problems. Cheap and easily available computational power combined with labeled big datasets enabled deep learning algorithms to show their full potential. AlexNet paper(2012; Krizhevsky et al.[9]) showed feeding sufficient data into deep neural networks successfully learned to extract representations better than handcrafted features which let the start an era known as the rise of Deep Learning. Their great success in solving otherwise hard engineering problems such as object detection, voice recognition, chatbots, robotic manipulation and autonomous systems shown they can be applied to various fields thanks to their generalisation capability.[16] Path Planning(Motion Planning) is defined as computing a continuous path from starting position S to destination position D while avoiding any known obstacles in the way.[20] Whether it is in 2D or 3D geometry, any robotic system then will able to follow the computed path to reach it's destination. Real World robotic systems tend to use more explainable and reproducible algorithms based on interval based search (A star or Dijkstra) or sampling-based algorithms. We wanted to show a reward based algorithm that depends on Markov Decision Process(MDP) by trying to maximize cumulative future rewards can also complete long term path planning tasks. Advantage of using this option will allow autonomous robot(in our case simulated quadrotor) to create paths in non holonomic constraints which is something current methods fails to achieve.[1][17]
A Survey on Autonomous Vehicle Control in the Era of Mixed-Autonomy: From Physics-Based to AI-Guided Driving Policy Learning
This paper serves as an introduction and overview of the potentially useful models and methodologies from artificial intelligence (AI) into the field of transportation engineering for autonomous vehicle (AV) control in the era of mixed autonomy. We will discuss state-of-the-art applications of AI-guided methods, identify opportunities and obstacles, raise open questions, and help suggest the building blocks and areas where AI could play a role in mixed autonomy. We divide the stage of autonomous vehicle (AV) deployment into four phases: the pure HVs, the HV-dominated, the AVdominated, and the pure AVs. This paper is primarily focused on the latter three phases. It is the first-of-its-kind survey paper to comprehensively review literature in both transportation engineering and AI for mixed traffic modeling. Models used for each phase are summarized, encompassing game theory, deep (reinforcement) learning, and imitation learning. While reviewing the methodologies, we primarily focus on the following research questions: (1) What scalable driving policies are to control a large number of AVs in mixed traffic comprised of human drivers and uncontrollable AVs? (2) How do we estimate human driver behaviors? (3) How should the driving behavior of uncontrollable AVs be modeled in the environment? (4) How are the interactions between human drivers and autonomous vehicles characterized? Hopefully this paper will not only inspire our transportation community to rethink the conventional models that are developed in the data-shortage era, but also reach out to other disciplines, in particular robotics and machine learning, to join forces towards creating a safe and efficient mixed traffic ecosystem.
Vizarel: A System to Help Better Understand RL Agents
Deshpande, Shuby, Schneider, Jeff
Visualization tools for supervised learning have Visualization systems at their core consist of two components: allowed users to interpret, introspect, and gain representation and interaction. Though these may intuition for the successes and failures of their appear to be disparate, it is hard to discount the influence models. While reinforcement learning practitioners that each has on each other. The tools we use for representation ask many of the same questions, existing tools affect how we interact with the system, and our are not applicable to the RL setting. In this work, interaction affects the representations that we create (Yi we describe our initial attempt at constructing et al., 2007). Visualization interfaces should adhere to the a prototype of these ideas, through identifying human action cycle (Norman, 2013), which provides us possible features that such a system should encapsulate.
EVO-RL: Evolutionary-Driven Reinforcement Learning
Hallawa, Ahmed, Born, Thorsten, Schmeink, Anke, Dartmann, Guido, Peine, Arne, Martin, Lukas, Iacca, Giovanni, Eiben, A. E., Ascheid, Gerd
In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that this distinction is decided by the evolutionary process, thus allowing evo-RL to be adaptive to different environments. In addition, evo-RL facilitates learning on environments with rewardless states, which makes it more suited for real-world problems with incomplete information. To show that evo-RL leads to state-of-the-art performance, we present the performance of different state-of-the-art reinforcement learning algorithms when operating within evo-RL and compare it with the case when these same algorithms are executed independently. Results show that reinforcement learning algorithms embedded within our evo-RL approach significantly outperform the stand-alone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget.
Pre-trained Word Embeddings for Goal-conditional Transfer Learning in Reinforcement Learning
Hutsebaut-Buysse, Matthias, Mets, Kevin, Latré, Steven
Reinforcement learning (RL) algorithms typically start tabula rasa, without any prior knowledge of the environment, and without any prior skills. This however often leads to low sample efficiency, requiring a large amount of interaction with the environment. This is especially true in a lifelong learning setting, in which the agent needs to continually extend its capabilities. In this paper, we examine how a pre-trained task-independent language model can make a goal-conditional RL agent more sample efficient. We do this by facilitating transfer learning between different related tasks. We experimentally demonstrate our approach on a set of object navigation tasks.
Integrating Logical Rules Into Neural Multi-Hop Reasoning for Drug Repurposing
Liu, Yushan, Hildebrandt, Marcel, Joblin, Mitchell, Ringsquandl, Martin, Tresp, Volker
The graph structure of biomedical data differs from those in typical knowledge graph benchmark tasks. A particular property of biomedical data is the presence of long-range dependencies, which can be captured by patterns described as logical rules. We propose a novel method that combines these rules with a neural multi-hop reasoning approach that uses reinforcement learning. We conduct an empirical study based on the real-world task of drug repurposing by formulating this task as a link prediction problem. We apply our method to the biomedical knowledge graph Hetionet and show that our approach outperforms several baseline methods.
Learning Accurate and Human-Like Driving using Semantic Maps and Attention
Hecker, Simon, Dai, Dengxin, Liniger, Alexander, Van Gool, Luc
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like. To tackle the first issue we exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such. The maps are used in an attention mechanism that promotes segmentation confidence masks, thus focusing the network on semantic classes in the image that are important for the current driving situation. Human-like driving is achieved using adversarial learning, by not only minimizing the imitation loss with respect to the human driver but by further defining a discriminator, that forces the driving model to produce action sequences that are human-like. Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data. Extensive experiments show that our driving models are more accurate and behave more human-like than previous methods.
MAPS: Multi-agent Reinforcement Learning-based Portfolio Management System
Lee, Jinho, Kim, Raehyun, Yi, Seok-Won, Kang, Jaewoo
Generating an investment strategy using advanced deep learning methods in stock markets has recently been a topic of interest. Most existing deep learning methods focus on proposing an optimal model or network architecture by maximizing return. However, these models often fail to consider and adapt to the continuously changing market conditions. In this paper, we propose the Multi-Agent reinforcement learning-based Portfolio management System (MAPS). MAPS is a cooperative system in which each agent is an independent "investor" creating its own portfolio. In the training procedure, each agent is guided to act as diversely as possible while maximizing its own return with a carefully designed loss function. As a result, MAPS as a system ends up with a diversified portfolio. Experiment results with 12 years of US market data show that MAPS outperforms most of the baselines in terms of Sharpe ratio. Furthermore, our results show that adding more agents to our system would allow us to get a higher Sharpe ratio by lowering risk with a more diversified portfolio.
Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya, Bellemare, Marc G.
Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee stability. In this paper, we formally show that there are indeed nontrivial state representations under which the canonical TD algorithm is stable, even when learning off-policy. We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation. In the most general case, we show that a Schur basis provides convergence guarantees, but is difficult to estimate from samples. For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice. We conclude by empirically demonstrating that these stable representations can be learned using stochastic gradient descent, opening the door to improved techniques for representation learning with deep networks.
Learning to plan with uncertain topological maps
Beeching, Edward, Dibangoye, Jilles, Simonin, Olivier, Wolf, Christian
We train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. Whereas classical symbolic algorithms achieve optimal results on noise-less topologies, or optimal results in a probabilistic sense on graphs with probabilistic structure, we aim to show that machine learning can overcome missing information in the graph by taking into account rich high-dimensional node features, for instance visual information available at each location of the map. Compared to purely learned neural white box algorithms, we structure our neural model with an inductive bias for dynamic programming based shortest path algorithms, and we show that a particular parameterization of our neural model corresponds to the Bellman-Ford algorithm. By performing an empirical analysis of our method in simulated photo-realistic 3D environments, we demonstrate that the inclusion of visual features in the learned neural planner outperforms classical symbolic solutions for graph based planning.