Goto

Collaborating Authors

 Reinforcement Learning


Reinforcement Learning-Enabled Decision-Making Strategies for a Vehicle-Cyber-Physical-System in Connected Environment

arXiv.org Artificial Intelligence

As a typical vehicle-cyber-physical-system (V-CPS), connected automated vehicles attracted more and more attention in recent years. This paper focuses on discussing the decision-making (DM) strategy for autonomous vehicles in a connected environment. First, the highway DM problem is formulated, wherein the vehicles can exchange information via wireless networking. Then, two classical reinforcement learning (RL) algorithms, Q-learning and Dyna, are leveraged to derive the DM strategies in a predefined driving scenario. Finally, the control performance of the derived DM policies in safety and efficiency is analyzed. Furthermore, the inherent differences of the RL algorithms are embodied and discussed in DM strategies.


Learn to Earn: Enabling Coordination within a Ride Hailing Fleet

arXiv.org Artificial Intelligence

The problem of optimizing social welfare objectives on multi sided ride hailing platforms such as Uber, Lyft, etc., is challenging, due to misalignment of objectives between drivers, passengers, and the platform itself. An ideal solution aims to minimize the response time for each hyper local passenger ride request, while simultaneously maintaining high demand satisfaction and supply utilization across the entire city. Economists tend to rely on dynamic pricing mechanisms that stifle price sensitive excess demand and resolve the supply demand imbalances emerging in specific neighborhoods. In contrast, computer scientists primarily view it as a demand prediction problem with the goal of preemptively repositioning supply to such neighborhoods using black box coordinated multi agent deep reinforcement learning based approaches. Here, we introduce explainability in the existing supply repositioning approaches by establishing the need for coordination between the drivers at specific locations and times. Explicit need based coordination allows our framework to use a simpler non deep reinforcement learning based approach, thereby enabling it to explain its recommendations ex post. Moreover, it provides envy free recommendations i.e., drivers at the same location and time do not envy one another's future earnings. Our experimental evaluation demonstrates the effectiveness, the robustness, and the generalizability of our framework. Finally, in contrast to previous works, we make available a reinforcement learning environment for end to end reproducibility of our work and to encourage future comparative studies.


Transfer Deep Reinforcement Learning-enabled Energy Management Strategy for Hybrid Tracked Vehicle

arXiv.org Artificial Intelligence

This paper proposes an adaptive energy management strategy for hybrid electric vehicles by combining deep reinforcement learning (DRL) and transfer learning (TL). This work aims to address the defect of DRL in tedious training time. First, an optimization control modeling of a hybrid tracked vehicle is built, wherein the elaborate powertrain components are introduced. Then, a bi-level control framework is constructed to derive the energy management strategies (EMSs). The upper-level is applying the particular deep deterministic policy gradient (DDPG) algorithms for EMS training at different speed intervals. The lower-level is employing the TL method to transform the pre-trained neural networks for a novel driving cycle. Finally, a series of experiments are executed to prove the effectiveness of the presented control framework. The optimality and adaptability of the formulated EMS are illuminated. The founded DRL and TL-enabled control policy is capable of enhancing energy efficiency and improving system performance.


Human-like Energy Management Based on Deep Reinforcement Learning and Historical Driving Experiences

arXiv.org Artificial Intelligence

Development of hybrid electric vehicles depends on an advanced and efficient energy management strategy (EMS). With online and real-time requirements in mind, this article presents a human-like energy management framework for hybrid electric vehicles according to deep reinforcement learning methods and collected historical driving data. The hybrid powertrain studied has a series-parallel topology, and its control-oriented modeling is founded first. Then, the distinctive deep reinforcement learning (DRL) algorithm, named deep deterministic policy gradient (DDPG), is introduced. To enhance the derived power split controls in the DRL framework, the global optimal control trajectories obtained from dynamic programming (DP) are regarded as expert knowledge to train the DDPG model. This operation guarantees the optimality of the proposed control architecture. Moreover, the collected historical driving data based on experienced drivers are employed to replace the DP-based controls, and thus construct the human-like EMSs. Finally, different categories of experiments are executed to estimate the optimality and adaptability of the proposed human-like EMS. Improvements in fuel economy and convergence rate indicate the effectiveness of the constructed control structure.


DRIFT: Deep Reinforcement Learning for Functional Software Testing

arXiv.org Artificial Intelligence

Efficient software testing is essential for productive software development and reliable user experiences. As human testing is inefficient and expensive, automated software testing is needed. In this work, we propose a Reinforcement Learning (RL) framework for functional software testing named DRIFT. DRIFT operates on the symbolic representation of the user interface. It uses Q-learning through Batch-RL and models the state-action value function with a Graph Neural Network. We apply DRIFT to testing the Windows 10 operating system and show that DRIFT can robustly trigger the desired software functionality in a fully automated manner. Our experiments test the ability to perform single and combined tasks across different applications, demonstrating that our framework can efficiently test software with a large range of testing objectives.


Meta-Gradient Reinforcement Learning with an Objective Discovered Online

arXiv.org Artificial Intelligence

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median score of a strong actor-critic baseline.


Mixture of Step Returns in Bootstrapped DQN

arXiv.org Artificial Intelligence

The concept of utilizing multi-step returns for updating value functions has been adopted in deep reinforcement learning (DRL) for a number of years. Updating value functions with different backup lengths provides advantages in different aspects, including bias and variance of value estimates, convergence speed, and exploration behavior of the agent. Conventional methods such as TD-lambda leverage these advantages by using a target value equivalent to an exponential average of different step returns. Nevertheless, integrating step returns into a single target sacrifices the diversity of the advantages offered by different step return targets. To address this issue, we propose Mixture Bootstrapped DQN (MB-DQN) built on top of bootstrapped DQN, and uses different backup lengths for different bootstrapped heads. MB-DQN enables heterogeneity of the target values that is unavailable in approaches relying only on a single target value. As a result, it is able to maintain the advantages offered by different backup lengths. In this paper, we first discuss the motivational insights through a simple maze environment. In order to validate the effectiveness of MB-DQN, we perform experiments on the Atari 2600 benchmark environments, and demonstrate the performance improvement of MB-DQN over a number of baseline methods. We further provide a set of ablation studies to examine the impacts of different design configurations of MB-DQN.


Why supervised learning is more common than reinforcement learning

#artificialintelligence

Supervised learning is a more commonly used form of machine learning than reinforcement learning in part because it's a faster, cheaper form of machine learning. With data sets, a supervised learning model can be mapped to inputs and outputs to create image recognition or machine translation models. A reinforcement learning algorithm, on the other hand, must observe, and that can take time, said UC Berkeley professor Ion Stoica. Stoica works on robotics and reinforcement learning at UC Berkeley's RISELab, and if you're a developer working today, then you've likely used or come across some of his work that has built part of the modern infrastructure for machine learning. He spoke today as part of Transform, an annual AI event VentureBeat holds that this year takes place online.


Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators

arXiv.org Artificial Intelligence

Abstract-- Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. The system is informed of the desired target object in the form of a single, arbitrary-pose RGB image of that object, enabling the system to generalize to unseen objects without retraining. To achieve such a system, we combine several advances in deep reinforcement learning and present a large-scale distributed training system using synchronous SGD that seamlessly scales to multi-node, multi-GPU infrastructure to make rapid prototyping easier. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.


Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors

arXiv.org Artificial Intelligence

In this paper, we present a novel developmental reinforcement learning-based controller for a quadcopter with thrust vectoring capabilities. This multirotor UAV design has tilt-enabled rotors. It utilizes the rotor force magnitude and direction to achieve the desired state during flight. The control policy of this robot is learned using the policy transfer from the learned controller of the quadcopter (comparatively simple UAV design without thrust vectoring). This approach allows learning a control policy for systems with multiple inputs and multiple outputs. The performance of the learned policy is evaluated by physics-based simulations for the tasks of hovering and way-point navigation. The flight simulations utilize a flight controller based on reinforcement learning without any additional PID components. The results show faster learning with the presented approach as opposed to learning the control policy from scratch for this new UAV design created by modifications in a conventional quadcopter, i.e., the addition of more degrees of freedom (4-actuators in conventional quadcopter to 8-actuators in tilt-rotor quadcopter). We demonstrate the robustness of our learned policy by showing the recovery of the tilt-rotor platform in the simulation from various non-static initial conditions in order to reach a desired state. The developmental policy for the tilt-rotor UAV also showed superior fault tolerance when compared with the policy learned from the scratch. The results show the ability of the presented approach to bootstrap the learned behavior from a simpler system (lower-dimensional action-space) to a more complex robot (comparatively higher-dimensional action-space) and reach better performance faster.