AITopics

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

abstraction, machine learning, reinforcement learning, (14 more...)

2003.06016

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Option Discovery in the Absence of Rewards with Manifold Analysis

Bar, Amitay, Talmon, Ronen, Meir, Ron

Options have been shown to be an effective tool in reinforcement learning, facilitating improved exploration and learning. In this paper, we present an approach based on spectral graph theory and derive an algorithm that systematically discovers options without access to a specific reward or task assignment. As opposed to the common practice used in previous methods, our algorithm makes full use of the spectrum of the graph Laplacian. Incorporating modes associated with higher graph frequencies unravels domain subtleties, which are shown to be useful for option discovery. Using geometric and manifold-based analysis, we present a theoretical justification for the algorithm. In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods.

diffusion distance, diffusion option, eigenvector, (13 more...)

2003.05878

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Barros, Pablo, Bloem, Anne C., Hootsmans, Inge M., Opheij, Lena M., Toebosch, Romain H. A., Barakova, Emilia, Sciutti, Alessandra

The Chef's Hat Simulation Environment for Reinforcement-Learning-Based Agents

To achieve social interactions within Human-Robot Interaction (HRI) environments is a very challenging task. Most of the current research focuses on Wizard-of-Oz approaches, which neglect the recent development of intelligent robots. On the other hand, real-world scenarios usually do not provide the necessary control and reproducibility which are needed for learning algorithms. In this paper, we propose a virtual simulation environment that implements the Chef's Hat card game, designed to be used in HRI scenarios, to provide a controllable and reproducible scenario for reinforcement-learning algorithms.

agent, experiment, robot, (16 more...)

2003.05861

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

A General Framework for Learning Mean-Field Games

Guo, Xin, Hu, Anran, Xu, Renyuan, Zhang, Junzi

This paper is motivated by the following Ad auction problem for an advertiser. An Ad auction is a stochastic game on an Ad exchange platform among a large number of players, the advertisers. In between the time a web user requests a page and the time the page is displayed, usually within a millisecond, a Vickrey-type of second-best-price auction is run to incentivize interested advertisers to bid for an Ad slot to display advertisement. Each advertiser has limited information before each bid: first, her own valuation for a slot depends on an unknown conversion of clicks for the item; secondly, she, should she win the bid, only knows the reward after the user's activities on the website are finished. In addition, she has a budget constraint in this repeated auction. The question is, how should she bid in this online sequential repeated game when there is a large population of bidders competing on the Ad platform, with unknown distributions of the conversion of clicks and rewards? Besides the Ad auction, there are many real-world problems involving a large number of players and unknown systems. Examples include massive multi-player online roleplaying games [30], high frequency tradings [35], and the sharing economy [24].

algorithm, diam, gmfg, (15 more...)

2003.06069

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry:

Marketing (0.68)
Leisure & Entertainment > Games (0.48)
Information Technology > Services (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Scheller, Christian, Schraner, Yanick, Vogel, Manfred

Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3 rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.

demonstration, learning, sample efficient reinforcement learning, (11 more...)

2003.06066

Country: Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Heterogeneous Relational Reasoning in Knowledge Graphs with Reinforcement Learning

Saebi, Mandana, Krieg, Steven, Zhang, Chuxu, Jiang, Meng, Chawla, Nitesh

Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommender systems. In recent years, reinforcement learning (RL) has provided solutions that are more interpretable and explainable than other deep learning models. However, these solutions still face several challenges, including large action space for the RL agent and accurate representation of entity neighborhood structure. We address these problems by introducing a type-enhanced RL agent that uses the local neighborhood information for efficient path-based reasoning over knowledge graphs. Our solution uses graph neural network (GNN) for encoding the neighborhood information and utilizes entity types to prune the action space. Experiments on real-world dataset show that our method outperforms state-of-the-art RL methods and discovers more novel paths during the training procedure.

dataset, information, relation, (12 more...)

2003.0605

Country:

North America > United States > New York (0.05)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Namkoong, Hongseok, Keramati, Ramtin, Yadlowsky, Steve, Brunskill, Emma

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy. When unobserved confounders can affect every decision in an episode, we demonstrate that even small amounts of per-decision confounding can heavily bias OPE methods. Fortunately, in a number of important settings found in healthcare, policy-making, operations, and technology, unobserved confounders may primarily affect only one of the many decisions made. Under this less pessimistic model of one-decision confounding, we propose an efficient loss-minimization-based procedure for computing worst-case bounds, and prove its statistical consistency. On two simulated healthcare examples---management of sepsis patients and developmental interventions for autistic children---where this is a reasonable model of confounding, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies even under unobserved confounding.

behavior policy, evaluation policy, off-policy policy evaluation, (11 more...)

2003.05623

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Health Care Providers & Services (0.92)
Health & Medicine > Therapeutic Area > Neurology > Autism (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Analyzing Visual Representations in Embodied Navigation Tasks

Wijmans, Erik, Straub, Julian, Batra, Dhruv, Essa, Irfan, Hoffman, Judy, Morcos, Ari

Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks. We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.

agent, representation, visual representation, (15 more...)

2003.05993

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

Wang, Hui, Emmerich, Michael, Preuss, Mike, Plaat, Aske

The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.

elo rating, hyper-parameter, iteration, (14 more...)

2003.05988

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Portugal > Setubal > Setubal (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Go (0.35)
Leisure & Entertainment > Games > Chess (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

#artificialintelligenceMar-11-2020, 16:43:44 GMT

Training batch reinforcement learning policies with Amazon SageMaker RL Amazon Web Services

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. In addition to building ML models using more commonly used supervised and unsupervised learning techniques, you can also build reinforcement learning (RL) models using Amazon SageMaker RL. Amazon SageMaker RL includes pre-built RL libraries and algorithms that make it easy to get started with reinforcement learning. For more information, see Amazon SageMaker RL – Managed Reinforcement Learning with Amazon Sagemaker. Amazon SageMaker RL makes it easy to integrate with various simulation environments such as AWS RoboMaker, Open AI Gym, open-source environments, and custom-built environments for training RL models.

information, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.05)
North America > United States > California > Santa Barbara County > Santa Barbara (0.05)
North America > United States > California > San Diego County > San Diego (0.05)

Industry:

Retail > Online (0.40)
Information Technology > Services (0.40)
Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)