AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Model-Based Active Exploration

Shyam, Pranav, Jaśkowski, Wojciech, Gomez, Faustino

arXiv.org Artificial IntelligenceOct-29-2018

Efficient exploration is an unsolved problem in Reinforcement Learning. We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.

machine learning, reinforcement learning, transition, (17 more...)

arXiv.org Artificial Intelligence

1810.12162

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Social Vehicle Swarms: A Novel Perspective on Social-aware Vehicular Communication Architecture

Zhang, Yue, Tian, Fang, Song, Bin, Du, Xiaojiang

arXiv.org Artificial IntelligenceOct-29-2018

Abstract--Internet of vehicles is a promising area related to D2D communication and internet of things. We present a novel perspective for vehicular communications, social vehicle swarms, to study and analyze socially aware internet of vehicles with the assistance of an agent-based model intended to reveal hidden patterns behind superficial data. After discussing its components, namely its agents, environments, and rules, we introduce supportive technology and methods, deep reinforcement learning, privacy preserving data mining and sub-cloud computing, in order to detect the most significant and interesting information for each individual effectively, which is the key desire. Finally, several relevant research topics and challenges are discussed. NETNET of vehicles (IoV) is a particular case, with vehicles being basic units, of internet of things (IoT) [1], which allows objects or devices to interact and communicate, indicating that an intrinsic component of IoT or IoV is device-to-device (D2D) communication [2]. IoV aims to build an intelligent system to improve the quality of driving or living; formally, to increase the quality of experience (QoE) or quality of service (QoS) [3]. Further study of IoV could lead to integration with smart cities, where each building, house, or even each individual device is capable of communication via wired or wireless access. On the other hand, online social networks (OSNs) have gained a growing amount of attention during recent years, and their use has almost become a daily necessity. This work has been supported by the National Natural Science Foundation of China (Nos.61271173 and 61372068), the Research Fund for the Doctoral Program of Higher Education of China (No.20130203110005), the Fundamental Research Funds for the Central Universities (No.K5051301033), the 111 Project(No. B08038), and also supported by the ISN State Key Laboratory. Y. Zhang, F. Tian and B. Song are with the State Key Laboratory of Integrated Services Networks, Xidian University, 710071, China (email: y.zhang@stu.xidian.edu.cn,

machine learning, reinforcement learning, social vehicle swarm, (12 more...)

arXiv.org Artificial Intelligence

1810.11947

Country: Asia > China (0.65)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Reinforcement Learning Will Be 2019's Biggest Trend In Data Science

#artificialintelligenceOct-28-2018, 14:34:20 GMT

What will be the next thing to revolutionize data science in 2019? Reinforcement learning will be the next big thing in data science in 2019. While RL has been around for a long time in academia, it has hardly seen any industry adoption at all. Why? Partly because there have been plenty of low-hanging fruits to pick in predictive analytics, but mostly because of the barriers in implementation, knowledge and available tools. The potential value in using RL in proactive analytics and AI is enormous, but it also demands a greater skillset to master.

biggest trend, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

AI-Alerts: 2018 > 2018-10 > AAAI AI-Alert for Oct 30, 2018 (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Variational Inference with Tail-adaptive f-Divergence

Wang, Dilin, Liu, Hao, Liu, Qiang

arXiv.org Machine LearningOct-28-2018

Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using {\alpha}-divergences (with positive {\alpha} values) is their mass-covering property. However, estimating and optimizing {\alpha}-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and {\alpha}-divergences.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1810.11943

Country: North America (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach

Chang, Hao-Hsuan, Song, Hao, Yi, Yang, Zhang, Jianzhong, He, Haibo, Liu, Lingjia

arXiv.org Machine LearningOct-28-2018

Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/JIOT.2018.2872441

1810.11758

Country:

Asia (0.93)
North America > United States > Virginia (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Telecommunications (1.00)
Health & Medicine > Therapeutic Area (0.57)
Government > Regional Government > North America Government > United States Government (0.46)
Information Technology > Networks (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Humans help robots learn tasks

#artificialintelligenceOct-27-2018, 02:52:00 GMT

Bender is one of the robot arms that a team of Stanford researchers is using to test two frameworks that, together, could make it faster and easier to teach robots basic skills. The RoboTurk framework allows people to direct the robot arms in real time with a smartphone and a browser by showing the robot how to carry out tasks like picking up objects. SURREAL speeds the learning process by running multiple experiences at once, essentially allowing the robots to learn from many experiences simultaneously. "With RoboTurk and SURREAL, we can push the boundary of what robots can do by combining lots of data collected by humans and coupling that with large-scale reinforcement learning," said Mandlekar, a member of the team that developed the frameworks. The group will be presenting RoboTurk and SURREAL Oct. 29 at the conference on robot learning in Zurich, Switzerland.

human help robot learn task, robot, robot arm, (7 more...)

#artificialintelligence

Country: Europe > Switzerland > Zürich > Zürich (0.57)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.37)

Add feedback

Curiosity and Procrastination in Reinforcement Learning

#artificialintelligenceOct-26-2018, 06:44:48 GMT

artificial intelligence, machine learning, reinforcement learning

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation

Fabisch, Alexander

arXiv.org Machine LearningOct-26-2018

Contextual policy search (CPS) is a class of multi-task reinforcement learning algorithms that is particularly useful for robotic applications. A recent state-of-the-art method is Contextual Covariance Matrix Adaptation Evolution Strategies (C-CMA-ES). It is based on the standard black-box optimization algorithm CMA-ES. There are two useful extensions of CMA-ES that we will transfer to C-CMA-ES and evaluate empirically: ACM-ES, which uses a comparison-based surrogate model, and aCMA-ES, which uses an active update of the covariance matrix. We will show that improvements with these methods can be impressive in terms of sample-efficiency, although this is not relevant any more for the robotic domain.

evolutionary algorithm, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1810.11491

Country: North America > Canada (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)

Add feedback

Multi-Stage Reinforcement Learning For Object Detection

Koenig, Jonas, Malberg, Simon, Martens, Martin, Niehaus, Sebastian, Krohn-Grimberghe, Artus, Ramaswamy, Arunselvan

arXiv.org Machine LearningOct-26-2018

We present a reinforcement learning approach for detecting objects within an image. Our approach performs a step-wise deformation of a bounding box with the goal of tightly framing the object. It uses a hierarchical tree-like representation of predefined region candidates, which the agent can zoom in on. This reduces the number of region candidates that must be evaluated so that the agent can afford to compute new feature maps before each step to enhance detection quality. We compare an approach that is based purely on zoom actions with one that is extended by a second refinement stage to fine-tune the bounding box after each zoom step. We also improve the fitting ability by allowing for different aspect ratios of the bounding box. Finally, we propose different reward functions to lead to a better guidance of the agent while following its search trajectories. Experiments indicate that each of these extensions leads to more correct detections. The best performing approach comprises a zoom stage and a refinement stage, uses aspect-ratio modifying actions and is trained using a combination of three different reward metrics.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1810.10325

Country: Europe > Germany (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

Hafez, Muhammad Burhan, Weber, Cornelius, Kerzel, Matthias, Wermter, Stefan

arXiv.org Artificial IntelligenceOct-26-2018

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data- efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1810.11388

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback