AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Deep reinforcement learning for large-scale epidemic control

Libin, Pieter, Moonens, Arno, Verstraeten, Timothy, Perez-Sanjines, Fabian, Hens, Niel, Lemey, Philippe, Nowé, Ann

arXiv.org Artificial IntelligenceMar-30-2020

Epidemics of infectious diseases are an important threat to public health and global economies. Yet, the development of prevention strategies remains a challenging process, as epidemics are non-linear and complex processes. For this reason, we investigate a deep reinforcement learning approach to automatically learn prevention strategies in the context of pandemic influenza. Firstly, we construct a new epidemiological meta-population model, with 379 patches (one for each administrative district in Great Britain), that adequately captures the infection process of pandemic influenza. Our model balances complexity and computational efficiency such that the use of reinforcement learning techniques becomes attainable. Secondly, we set up a ground truth such that we can evaluate the performance of the 'Proximal Policy Optimization' algorithm to learn in a single district of this epidemiological model. Finally, we consider a large-scale problem, by conducting an experiment where we aim to learn a joint policy to control the districts in a community of 11 tightly coupled districts, for which no ground truth can be established. This experiment shows that deep reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space. Moreover, through this experiment, we demonstrate that there can be an advantage to consider collaboration between districts when designing prevention strategies.

epidemiological model, experiment, reinforcement, (13 more...)

arXiv.org Artificial Intelligence

2003.13676

Country:

Europe > United Kingdom > England (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
(9 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Task Reinforcement Learning with Soft Modularization

Yang, Ruihan, Xu, Huazhe, Wu, Yi, Wang, Xiaolong

arXiv.org Artificial IntelligenceMar-30-2020

Multi-task learning is a very challenging problem in reinforcement learning. While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It is unclear what parameters in the network should be reused across tasks, and the gradients from different tasks may interfere with each other. Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue. Given a base policy network, we design a routing network which estimates different routing strategies to reconfigure the base network for each task. Instead of creating a concrete route for each task, our task-specific policy is represented by a soft combination of all possible routes. We name this approach soft modularization. We experiment with multiple robotics manipulation tasks in simulation and show our method improves sample efficiency and performance over baselines by a large margin.

baseline, different task, module, (9 more...)

arXiv.org Artificial Intelligence

2003.13661

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning with Weighted Q-Learning

Cini, Andrea, D'Eramo, Carlo, Peters, Jan, Alippi, Cesare

arXiv.org Artificial IntelligenceMar-30-2020

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.

agent, algorithm, q-learning, (13 more...)

arXiv.org Artificial Intelligence

2003.0928

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(6 more...)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deep Reinforcement Learning

#artificialintelligenceMar-29-2020, 09:26:14 GMT

For all lectures, slides, and lab materials: http://introtodeeplearning.com

deep reinforcement learning

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Google Introduces Neuroevolution for Self-Interpretable Agents

#artificialintelligenceMar-29-2020, 02:58:23 GMT

Good gamers can tune out distractions and unimportant on-screen information and focus their attention on avoiding obstacles and overtaking others in virtual racing games like Mario Kart. However, can machines behave similarly in such vision-based tasks? A possible solution is designing agents that encode and process abstract concepts, and research in this area has focused on learning all abstract information from visual inputs. This however is compute intensive and can even degrade model performance. Now, researchers from Google Brain Tokyo and Google Japan have proposed a novel approach that helps guide reinforcement learning (RL) agents to what's important in vision-based tasks.

google introduce neuroevolution, self-interpretable agent, vision-based task, (1 more...)

#artificialintelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.27)

Genre: Research Report (0.39)

Industry: Leisure & Entertainment (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

Rostami, Mohammad (University of Pennsylvania) | Isele, David | Eaton, Eric

Journal of Artificial Intelligence ResearchMar-29-2020

Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.

descriptor, learning, task descriptor, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11304

AI Access Foundation

11304

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material (0.73)
Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
(2 more...)

Add feedback

Learning and Testing Variable Partitions

Bogdanov, Andrej, Wang, Baoxiang

arXiv.org Machine LearningMar-29-2020

$ $Let $F$ be a multivariate function from a product set $\Sigma^n$ to an Abelian group $G$. A $k$-partition of $F$ with cost $\delta$ is a partition of the set of variables $\mathbf{V}$ into $k$ non-empty subsets $(\mathbf{X}_1, \dots, \mathbf{X}_k)$ such that $F(\mathbf{V})$ is $\delta$-close to $F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)$ for some $F_1, \dots, F_k$ with respect to a given error metric. We study algorithms for agnostically learning $k$ partitions and testing $k$-partitionability over various groups and error metrics given query access to $F$. In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$. In contrast, for $k = 2$ and $n = 3$ learning a partition of cost $\delta + \epsilon$ is NP-hard. $2.$ When $F$ is real-valued and the error metric is the 2-norm, a 2-partition of cost $\sqrt{\delta^2 + \epsilon}$ can be learned in time $\tilde{\mathcal{O}}(n^5/\epsilon^2)$. $3.$ When $F$ is $\mathbb{Z}_q$-valued and the error metric is Hamming weight, $k$-partitionability is testable with one-sided error and $\mathcal{O}(kn^3/\epsilon)$ non-adaptive queries. We also show that even two-sided testers require $\Omega(n)$ queries when $k = 2$. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.

algorithm, partition, probability, (16 more...)

arXiv.org Machine Learning

2003.1299

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey

Zhang, Chongzhen, Wang, Jianrui, Yen, Gary G., Zhao, Chaoqiang, Sun, Qiyu, Tang, Yang, Qian, Feng, Kurths, Jürgen

arXiv.org Artificial IntelligenceMar-29-2020

With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making and control for autonomous systems have improved significantly in the past years. When autonomous systems consider the performance of accuracy and transferability simultaneously, several AI methods, like adversarial learning, reinforcement learning (RL) and meta-learning, show their powerful performance. Here, we review the learning-based approaches in autonomous systems from the perspectives of accuracy and transferability. Accuracy means that a well-trained model shows good results during the testing phase, in which the testing set shares a same task or a data distribution with the training set. Transferability means that when an trained model is transferred to other testing domains, the accuracy is still good. Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL and meta-learning. Secondly, we focus on reviewing the accuracy and transferability to show the advantages of adversarial learning, like generative adversarial networks (GANs), in typical computer vision tasks in autonomous systems, including image style transfer, image super-resolution, image deblurring/dehazing/rain removal, semantic segmentation, depth estimation and person re-identification. Then, we further review the performance of RL and meta-learning from the aspects of accuracy and transferability in autonomous systems, involving robot navigation and robotic manipulation. Finally, we discuss several challenges and future topics for using adversarial learning, RL and meta-learning in autonomous systems.

computer vision, proceedings, transferability, (11 more...)

arXiv.org Artificial Intelligence

2003.12948

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
Europe > Germany > Brandenburg > Potsdam (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Google is using AI to design chips that will accelerate AI

#artificialintelligenceMar-28-2020, 04:23:59 GMT

A new reinforcement-learning algorithm has learned to optimize the placement of components on a computer chip to make it more efficient and less power-hungry. It requires the careful configuration of hundreds, sometimes thousands, of components across multiple layers in a constrained area. Traditionally, engineers will manually design configurations that minimize the amount of wire used between components as a proxy for efficiency. They then use electronic design automation software to simulate and verify their performance, which can take up to 30 hours for a single floor plan. Time lag: Because of the time investment put into each chip design, chips are traditionally supposed to last between two and five years.

accelerate ai, algorithm, electronic design automation software, (10 more...)

#artificialintelligence

Industry: Semiconductors & Electronics (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

Rakhsha, Amin, Radanovic, Goran, Devidze, Rati, Zhu, Xiaojin, Singla, Adish

arXiv.org Artificial IntelligenceMar-28-2020

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

agent, optimization problem, policy teaching, (14 more...)

arXiv.org Artificial Intelligence

2003.12909

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback