Goto

Collaborating Authors

 Reinforcement Learning


Model-Based Active Exploration

arXiv.org Artificial Intelligence

Efficient exploration is an unsolved problem in Reinforcement Learning. We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.


Social Vehicle Swarms: A Novel Perspective on Social-aware Vehicular Communication Architecture

arXiv.org Artificial Intelligence

Abstract--Internet of vehicles is a promising area related to D2D communication and internet of things. We present a novel perspective for vehicular communications, social vehicle swarms, to study and analyze socially aware internet of vehicles with the assistance of an agent-based model intended to reveal hidden patterns behind superficial data. After discussing its components, namely its agents, environments, and rules, we introduce supportive technology and methods, deep reinforcement learning, privacy preserving data mining and sub-cloud computing, in order to detect the most significant and interesting information for each individual effectively, which is the key desire. Finally, several relevant research topics and challenges are discussed. NETNET of vehicles (IoV) is a particular case, with vehicles being basic units, of internet of things (IoT) [1], which allows objects or devices to interact and communicate, indicating that an intrinsic component of IoT or IoV is device-to-device (D2D) communication [2]. IoV aims to build an intelligent system to improve the quality of driving or living; formally, to increase the quality of experience (QoE) or quality of service (QoS) [3]. Further study of IoV could lead to integration with smart cities, where each building, house, or even each individual device is capable of communication via wired or wireless access. On the other hand, online social networks (OSNs) have gained a growing amount of attention during recent years, and their use has almost become a daily necessity. This work has been supported by the National Natural Science Foundation of China (Nos.61271173 and 61372068), the Research Fund for the Doctoral Program of Higher Education of China (No.20130203110005), the Fundamental Research Funds for the Central Universities (No.K5051301033), the 111 Project(No. B08038), and also supported by the ISN State Key Laboratory. Y. Zhang, F. Tian and B. Song are with the State Key Laboratory of Integrated Services Networks, Xidian University, 710071, China (email: y.zhang@stu.xidian.edu.cn,


Reinforcement Learning Will Be 2019's Biggest Trend In Data Science

#artificialintelligence

What will be the next thing to revolutionize data science in 2019? Reinforcement learning will be the next big thing in data science in 2019. While RL has been around for a long time in academia, it has hardly seen any industry adoption at all. Why? Partly because there have been plenty of low-hanging fruits to pick in predictive analytics, but mostly because of the barriers in implementation, knowledge and available tools. The potential value in using RL in proactive analytics and AI is enormous, but it also demands a greater skillset to master.


Variational Inference with Tail-adaptive f-Divergence

arXiv.org Machine Learning

Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using {\alpha}-divergences (with positive {\alpha} values) is their mass-covering property. However, estimating and optimizing {\alpha}-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and {\alpha}-divergences.


Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach

arXiv.org Machine Learning

Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.


Humans help robots learn tasks

#artificialintelligence

Bender is one of the robot arms that a team of Stanford researchers is using to test two frameworks that, together, could make it faster and easier to teach robots basic skills. The RoboTurk framework allows people to direct the robot arms in real time with a smartphone and a browser by showing the robot how to carry out tasks like picking up objects. SURREAL speeds the learning process by running multiple experiences at once, essentially allowing the robots to learn from many experiences simultaneously. "With RoboTurk and SURREAL, we can push the boundary of what robots can do by combining lots of data collected by humans and coupling that with large-scale reinforcement learning," said Mandlekar, a member of the team that developed the frameworks. The group will be presenting RoboTurk and SURREAL Oct. 29 at the conference on robot learning in Zurich, Switzerland.



Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation

arXiv.org Machine Learning

Contextual policy search (CPS) is a class of multi-task reinforcement learning algorithms that is particularly useful for robotic applications. A recent state-of-the-art method is Contextual Covariance Matrix Adaptation Evolution Strategies (C-CMA-ES). It is based on the standard black-box optimization algorithm CMA-ES. There are two useful extensions of CMA-ES that we will transfer to C-CMA-ES and evaluate empirically: ACM-ES, which uses a comparison-based surrogate model, and aCMA-ES, which uses an active update of the covariance matrix. We will show that improvements with these methods can be impressive in terms of sample-efficiency, although this is not relevant any more for the robotic domain.


Multi-Stage Reinforcement Learning For Object Detection

arXiv.org Machine Learning

We present a reinforcement learning approach for detecting objects within an image. Our approach performs a step-wise deformation of a bounding box with the goal of tightly framing the object. It uses a hierarchical tree-like representation of predefined region candidates, which the agent can zoom in on. This reduces the number of region candidates that must be evaluated so that the agent can afford to compute new feature maps before each step to enhance detection quality. We compare an approach that is based purely on zoom actions with one that is extended by a second refinement stage to fine-tune the bounding box after each zoom step. We also improve the fitting ability by allowing for different aspect ratios of the bounding box. Finally, we propose different reward functions to lead to a better guidance of the agent while following its search trajectories. Experiments indicate that each of these extensions leads to more correct detections. The best performing approach comprises a zoom stage and a refinement stage, uses aspect-ratio modifying actions and is trained using a combination of three different reward metrics.


Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

arXiv.org Artificial Intelligence

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data- efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.