Goto

Collaborating Authors

 Reinforcement Learning


Learn to Navigate Maplessly with Varied LiDAR Configurations: A Support Point Based Approach

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) demonstrates great potential in mapless navigation domain. However, such a navigation model is normally restricted to a fixed configuration of the range sensor because its input format is fixed. In this paper, we propose a DRL model that can address range data obtained from different range sensors with different installation positions. Our model first extracts the goal-directed features from each obstacle point. Subsequently, it chooses global obstacle features from all point-feature candidates and uses these features for the final decision. As only a few points are used to support the final decision, we refer to these points as support points and our approach as support-point based navigation (SPN). Our model can handle data from different LiDAR setups and demonstrates good performance in simulation and real-world experiments. It can also be used to guide the installation of range sensors to enhance robot navigation performance.


Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race

#artificialintelligence

This is a guest post by Ray Goh, a tech executive at DBS Bank. AWS DeepRacer is an autonomous 1/18th scale race car powered by reinforcement learning, and the AWS DeepRacer League is the world's first global autonomous racing league. It's a fun and easy way to get started with machine learning (ML), regardless of skill or background. For companies, it's also a powerful platform to facilitate teaching ML to employees at the enterprise level. We've partnered with AWS to bring the AWS DeepRacer League to DBS to train over 3,000 employees in AI and ML by the end of 2020.


Introduction to Model-Based Reinforcement Learning

#artificialintelligence

In this article, I want to give an introduction to Model-Based Reinforcement Learning. Talk about the fundamental concept behind MB-RL, the benefits of those methods, their applications, and also the challenges and difficulties that come with applying MB RL to your problem. In artificial intelligence (AI) sequential decision-making, commonly formalized as MDP, is one of the key challenges. Reinforcement Learning and Planning have been two successful approaches to solve these problems. A logical step would be to combine both methods in order to obtain advantages for both and hopefully eliminate their disadvantages.


smearle/gym-city

#artificialintelligence

The work in this repo was presented and demoed at the 2019 Experimental A.I. in Games (EXAG) workshop, an AIIDE workshop. Feel free to join the conversation surrounding this work via my Twitter, and on r/MachineLearning. The player builds places urban structures on a 2D map. In certain configurations, these structures invite population and vertical development. Reinforcement Learning agents are rewarded as a function of population or other city-wide metrics.


AWAC: accelerating online reinforcement learning with offline datasets

AIHub

Robots trained with reinforcement learning (RL) have the potential to be used across a huge variety of challenging real world problems. To apply RL to a new problem, you typically set up the environment, define a reward function, and train the robot to solve the task by allowing it to explore the new environment from scratch. While this may eventually work, these "online" RL methods are data hungry and repeating this data inefficient process for every new problem makes it difficult to apply online RL to real world robotics problems. What if instead of repeating the data collection and learning process from scratch every time, we were able to reuse data across multiple problems or experiments? By doing so, we could greatly reduce the burden of data collection with every new problem that is encountered.


A Reinforcement Learning Approach to Health Aware Control Strategy

arXiv.org Artificial Intelligence

Health-aware control (HAC) has emerged as one of the domains where control synthesis is sought based upon the failure prognostics of system/component or the Remaining Useful Life (RUL) predictions of critical components. The fact that mathematical dynamic (transition) models of RUL are rarely available, makes it difficult for RUL information to be incorporated into the control paradigm. A novel framework for health aware control is presented in this paper where reinforcement learning based approach is used to learn an optimal control policy in face of component degradation by integrating global system transition data (generated by an analytical model that mimics the real system) and RUL predictions. The RUL predictions generated at each step, is tracked to a desired value of RUL. The latter is integrated within a cost function which is maximized to learn the optimal control. The proposed method is studied using simulation of a DC motor and shaft wear.


Every Hidden Unit Maximizing Output Weights Maximizes The Global Reward

arXiv.org Artificial Intelligence

For a network of stochastic units trained on a reinforcement learning task, one biologically plausible way of learning is to treat each unit as a reinforcement learning unit and train each unit by REINFORCE using the same global reward signal. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called Weight Maximization, which can significantly improve the speed of applying REINFORCE to all units. Essentially, we replace the global reward to each hidden unit with the change in the norm of output weights, such that each hidden unit in the network is trying to maximize the norm of output weights instead of the global reward. We found that the new algorithm can solve simple reinforcement learning tasks significantly faster than the baseline model. We also prove that the resulting learning rule is approximately following gradient ascent on the reward in expectation when applied to a multi-layer network of Bernoulli logistic unit. It illustrates an example of intelligent behavior arising from a population of self-interested hedonistic neurons, which corresponds to Klopf's hedonistic neuron hypothesis.


Deep Reinforcement Learning for Adaptive Network Slicing in 5G for Intelligent Vehicular Systems and Smart Cities

arXiv.org Artificial Intelligence

Intelligent vehicular systems and smart city applications are the fastest growing Internet of things (IoT) implementations at a compound annual growth rate of 30%. In view of the recent advances in IoT devices and the emerging new breed of IoT applications driven by artificial intelligence (AI), fog radio access network (F-RAN) has been recently introduced for the fifth generation (5G) wireless communications to overcome the latency limitations of cloud-RAN (C-RAN). We consider the network slicing problem of allocating the limited resources at the network edge (fog nodes) to vehicular and smart city users with heterogeneous latency and computing demands in dynamic environments. We develop a network slicing model based on a cluster of fog nodes (FNs) coordinated with an edge controller (EC) to efficiently utilize the limited resources at the network edge. For each service request in a cluster, the EC decides which FN to execute the task, i.e., locally serve the request at the edge, or to reject the task and refer it to the cloud. We formulate the problem as infinite-horizon Markov decision process (MDP) and propose a deep reinforcement learning (DRL) solution to adaptively learn the optimal slicing policy. The performance of the proposed DRL-based slicing method is evaluated by comparing it with other slicing approaches in dynamic environments and for different scenarios of design objectives. Comprehensive simulation results corroborate that the proposed DRL-based EC quickly learns the optimal policy through interaction with the environment, which enables adaptive and automated network slicing for efficient resource allocation in dynamic vehicular and smart city environments.


Implicit Distributional Reinforcement Learning

arXiv.org Machine Learning

To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor-critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution. We adopt a distributional perspective on the discounted cumulative return and model it with a state-action-dependent implicit distribution, which is approximated by the DGNs that take state-action pairs and random noises as their input. Moreover, we use the SIA to provide a semi-implicit policy distribution, which mixes the policy parameters with a reparameterizable distribution that is not constrained by an analytic density function. In this way, the policy's marginal distribution is implicit, providing the potential to model complex properties such as covariance structure and skewness, but its parameter and entropy can still be estimated. We incorporate these features with an off-policy algorithm framework to solve problems with continuous action space and compare IDAC with state-of-the-art algorithms on representative OpenAI Gym environments. We observe that IDAC outperforms these baselines in most tasks.


Using Reinforcement Learning to Allocate and Manage Service Function Chains in Cellular Networks

arXiv.org Machine Learning

It is expected that the next generation cellular networks provide a connected society with fully mobility to empower the socio-economic transformation. Several other technologies will benefits of this evolution, such as Internet of Things, smart cities, smart agriculture, vehicular networks, healthcare applications, and so on. Each of these scenarios presents specific requirements and demands different network configurations. To deal with this heterogeneity, virtualization technology is key technology. Indeed, the network function virtualization (NFV) paradigm provides flexibility for the network manager, allocating resources according to the demand, and reduces acquisition and operational costs. In addition, it is possible to specify an ordered set of network virtual functions (VNFs) for a given service, which is called as service function chain (SFC). However, besides the advantages from service virtualization, it is expected that network performance and availability do not be affected by its usage. In this paper, we propose the use of reinforcement learning to deploy a SFC of cellular network service and manage the VNFs operation. We consider that the SFC is deployed by the reinforcement learning agent considering a scenarios with distributed data centers, where the VNFs are deployed in virtual machines in commodity servers. The NFV management is related to create, delete, and restart the VNFs. The main purpose is to reduce the number of lost packets taking into account the energy consumption of the servers. We use the Proximal Policy Optimization (PPO) algorithm to implement the agent and preliminary results show that the agent is able to allocate the SFC and manage the VNFs, reducing the number of lost packets.