AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Joint State-Action Embedding for Efficient Reinforcement Learning

Pritz, Paul J., Ma, Liang, Leung, Kin K.

arXiv.org Artificial IntelligenceOct-12-2020

While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we propose a new approach for jointly embedding states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that uses these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming and recommender system environments show it significantly outperforms state-of-the-art models in discrete domains with large state/action space, thus confirming the efficacy of joint embedding and its overall superior performance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.04444

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning with Applications in One Picture

#artificialintelligenceOct-11-2020, 22:02:07 GMT

Interesting picture summarizing several types of techniques used in machine learning, contrasting unsupervised learning with unsupervised learning and reinforcement learning. The difference between supervised and unsupervised learning is described here. See also this article about how this relates to reinforcement learning. The above picture was posted here.

artificial intelligence, learning, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

Lin, Zhengxian, Lam, Kim-Ho, Fern, Alan

arXiv.org Artificial IntelligenceOct-11-2020

We investigate a deep reinforcement learning (RL) architecture that supports explaining why a learned agent prefers one action over another. The key idea is to learn action-values that are directly represented via human-understandable properties of expected futures. This is realized via the embedded self-prediction (ESP)model, which learns said properties in terms of human provided features. Action preferences can then be explained by contrasting the future properties predicted for each action. To address cases where there are a large number of features, we develop a novel method for computing minimal sufficient explanations from anESP. Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations.

agent, computer game, upstream oil & gas, (22 more...)

arXiv.org Artificial Intelligence

2010.0518

Country:

North America > United States (0.28)
Europe > Sweden (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.69)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Adversaries

Casper, Stephen

arXiv.org Artificial IntelligenceOct-11-2020

As progress in AI continues to advance at a rapid pace, it is crucial to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build systems which may have capabilities at or above the human level is of particular concern. One might suspect that superhumanly-intelligent systems should be modeled as as something which humans, by definition, can't outsmart. However, as a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that highly-effective goal-oriented systems -- even ones that are potentially superintelligent -- may nonetheless have stable decision theoretic delusions which cause them to make obviously irrational decisions in adversarial settings. In a survey of relevant dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made involving the ways in which these weaknesses could be implanted into a system.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.05418

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Arizona (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.82)
Overview (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)

Add feedback

Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks

Chen, Beiran, Zhang, Yi, Iosifidis, George, Liu, Mingming

arXiv.org Artificial IntelligenceOct-10-2020

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcement-learning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.

cloud computing, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.05024

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Services (0.82)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Safe Reinforcement Learning with Natural Language Constraints

Yang, Tsung-Yen, Hu, Michael, Chow, Yinlam, Ramadge, Peter J., Narasimhan, Karthik

arXiv.org Artificial IntelligenceOct-10-2020

In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average. Dataset and code to reproduce our experiments are available at https://sites.google.com/view/polco-hazard-world/.

constraint, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2010.0515

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Fast active learning for pure exploration in reinforcement learning

Ménard, Pierre, Domingues, Omar Darwiche, Jonsson, Anders, Kaufmann, Emilie, Leurent, Edouard, Valko, Michal

arXiv.org Machine LearningOct-10-2020

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common rule of thumb for exploration bonuses is to use $1/\sqrt{n}$ bonus that is added to the empirical estimates of the reward, where $n$ is a number of times this particular state (or a state-action pair) was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon $H$. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2007.13442

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Learning not to learn: Nature versus nurture in silico

Lange, Robert Tjarko, Sprekeler, Henning

arXiv.org Artificial IntelligenceOct-9-2020

Animals are equipped with a rich innate repertoire of sensory, behavioral and motor skills, which allows them to interact with the world immediately after birth. At the same time, many behaviors are highly adaptive and can be tailored to specific environments by means of learning. In this work, we use mathematical analysis and the framework of meta-learning (or'learning to learn') to answer when it is beneficial to learn such an adaptive strategy and when to hard-code a heuristic behavior. We find that the interplay of ecological uncertainty, task complexity and the agents' lifetime has crucial effects on the meta-learned amortized Bayesian inference performed by an agent. There exist two regimes: One in which metalearning yields a learning algorithm that implements task-dependent informationintegration and a second regime in which meta-learning imprints a heuristic or'hard-coded' behavior. Further analysis reveals that nonadaptive behaviors are not only optimal for aspects of the environment that are stable across individuals, but also in situations where an adaptation to the environment would in fact be highly beneficial, but could not be done quickly enough to be exploited within the remaining lifetime. Hard-coded behaviors should hence not only be those that always work, but also those that are too complex to be learned within a reasonable time frame. The'nature versus nurture' debate (e.g., Mutti et al., 1996; Tabery, 2014) - the question which aspects of behavior are'hard-coded' by evolution, and which are learned from experience - is one of the oldest and most controversial debates in biology.

agent, bayesian inference, neural network, (17 more...)

arXiv.org Artificial Intelligence

2010.04466

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Oil & Gas (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Characterizing Policy Divergence for Personalized Meta-Reinforcement Learning

Zhang, Michael

arXiv.org Artificial IntelligenceOct-9-2020

Despite ample motivation from costly exploration and limited trajectory data, rapidly adapting to new environments with few-shot reinforcement learning (RL) can remain a challenging task, especially with respect to personalized settings. Here, we consider the problem of recommending optimal policies to a set of multiple entities each with potentially different characteristics, such that individual entities may parameterize distinct environments with unique transition dynamics. Inspired by existing literature in meta-learning, we extend previous work by focusing on the notion that certain environments are more similar to each other than others in personalized settings, and propose a model-free meta-learning algorithm that prioritizes past experiences by relevance during gradient-based adaptation. Our algorithm involves characterizing past policy divergence through methods in inverse reinforcement learning, and we illustrate how such metrics are able to effectively distinguish past policy parameters by the environment they were deployed in, leading to more effective fast adaptation during test time. To study personalization more effectively we introduce a navigation testbed to specifically incorporate environment diversity across training episodes, and demonstrate that our approach outperforms meta-learning alternatives with respect to few-shot reinforcement learning in personalized settings.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2010.04816

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Switzerland > Neuchâtel > Neuchâtel (0.04)

Genre: Research Report (0.66)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models

Colas, Cédric, Hejblum, Boris, Rouillon, Sébastien, Thiébaut, Rodolphe, Oudeyer, Pierre-Yves, Moulin-Frier, Clément, Prague, Mélanie

arXiv.org Artificial IntelligenceOct-9-2020

Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning, might bring significant value. However, the specificity of each domain -- epidemic modelling or solving optimization problem -- requires strong collaborations between researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on Q-Learning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choices to be made by health decision-makers.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1.12588

2010.04452

Country:

Europe > Czechia > Prague (0.05)
Europe > France > Île-de-France (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback