AITopics

Machine learning has shown growing success in recent years. However, current machine learning systems are highly specialized, trained for particular problems or domains, and typically on a single narrow dataset. Human learning, on the other hand, is highly general and adaptable. Never-ending learning is a machine learning paradigm that aims to bridge this gap, with the goal of encouraging researchers to design machine learning systems that can learn to perform a wider variety of inter-related tasks in more complex environments. To date, there is no environment or testbed to facilitate the development and evaluation of never-ending learning systems. To this end, we propose the Jelly Bean World testbed. The Jelly Bean World allows experimentation over two-dimensional grid worlds which are filled with items and in which agents can navigate. This testbed provides environments that are sufficiently complex and where more generally intelligent algorithms ought to perform better than current state-of-the-art reinforcement learning approaches. It does so by producing non-stationary environments and facilitating experimentation with multi-task, multi-agent, multi-modal, and curriculum learning settings. We hope that this new freely-available software will prompt new research and interest in the development and evaluation of never-ending learning systems and more broadly, general intelligence systems.

agent, experiment, never-ending learning, (15 more...)

2002.06306

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Sweden > Skåne County > Malmö (0.04)
Oceania > New Zealand (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.94)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Warchalski, Michal, Radojevic, Dimitrije, Milosevic, Milos

Deep RL Agent for a Real-Time Action Strategy Game

We introduce a reinforcement learning environment based on Heroic - Magic Duel, a 1 v 1 action strategy game. This domain is non-trivial for several reasons: it is a real-time game, the state space is large, the information given to the player before and at each step of a match is imperfect, and distribution of actions is dynamic. Our main contribution is a deep reinforcement learning agent playing the game at a competitive level that we trained using PPO and self-play with multiple competing agents, employing only a simple reward of $\pm 1$ depending on the outcome of a single match. Our best self-play agent, obtains around $65\%$ win rate against the existing AI and over $50\%$ win rate against a top human player.

agent, latexit latexit sha1, reinforcement, (16 more...)

2002.0629

Country:

Europe > Serbia > Central Serbia > Belgrade (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Beloborodov, Dmitrii, Ulanov, A. E., Foerster, Jakob N., Whiteson, Shimon, Lvovsky, A. I.

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The agent controls the algorithm by tuning one of its parameters with the goal of improving recently seen solutions. We propose a new Rescaled Ranked Reward (R3) method that enables stable single-player version of self-play training that helps the agent to escape local optima. The training on any problem instance can be accelerated by applying transfer learning from an agent trained on randomly generated problems. Our approach allows sampling high-quality solutions to the Ising problem with high probability and outperforms both baseline heuristics and a black-box hyperparameter optimization approach.

agent, optimization, reinforcement learning enhanced quantum-inspired algorithm, (8 more...)

2002.04676

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Campos, Víctor, Trott, Alexander, Xiong, Caiming, Socher, Richard, Giro-i-Nieto, Xavier, Torres, Jordi

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in understanding their limitations. Through theoretical analysis and empirical evidence, we show that existing algorithms suffer from a common limitation -- they discover options that provide a poor coverage of the state space. In light of this, we propose 'Explore, Discover and Learn' (EDL), an alternative approach to information-theoretic skill discovery. Crucially, EDL optimizes the same information-theoretic objective derived from the empowerment literature, but addresses the optimization problem using different machinery. We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.

discover and learn, mutual information, unsupervised discovery, (13 more...)

2002.03647

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.88)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

RL agents Implicitly Learning Human Preferences

Wichers, Nevan

In the real world, RL agents should be rewarded for fulfilling human preferences. We show that RL agents implicitly learn the preferences of humans in their environment. Training a classifier to predict if a simulated human's preferences are fulfilled based on the activations of a RL agent's neural network gets .93 AUC. Training a classifier on the raw environment state gets only .8 AUC. Training the classifier off of the RL agent's activations also does much better than training off of activations from an autoencoder. The human preference classifier can be used as the reward function of an RL agent to make RL agent more beneficial for humans.

agent, human preference, rl agent, (13 more...)

2002.06137

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

León, Borja G., Belardinelli, Francesco

Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning

This paper focus on formally extending Markov Learning (RL) has recently attracted interest as a way for singleagent Games (MGs), the mathematical model that is traditionally used in RL to learn multiple-task specifications. In this paper we extend MARL, to build a new general model, i.e, not focused solely in one this convergence to multi-agent settings and formally define Extended kind of multi-agent game, that allows multiple learning agents to Markov Games as a general mathematical model that allows concurrently fulfill various non-Markovian specifications in multiagent multiple RL agents to concurrently learn various non-Markovian settings. To support our model with empirical evidence, we specifications. To introduce this new model we provide formal definitions also extended two logic-based RL algorithms to multi-agents systems and proofs as well as empirical tests of RL algorithms running in order to show how various learning agents can fulfill different on this framework. Specifically, we use our model to train two different types of non-Markovian specifications expressed in co-safe- Lineartime logic-based multi-agent RL algorithms to solve diverse settings Temporal Logic (LT L). Our results are promising and point to of non-Markovian co-safe LT L specifications.

agent, algorithm, specification, (12 more...)

2002.06

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Jendele, Lukas, Christen, Sammy, Aksan, Emre, Hilliges, Otmar

Learning Functionally Decomposed Hierarchies for Continuous Control Tasks

Solving long-horizon sequential decision making tasks in environments with sparse rewards is a longstanding problem in reinforcement learning (RL) research. Hierarchical Reinforcement Learning (HRL) has held the promise to enhance the capabilities of RL agents via operation on different levels of temporal abstraction. Despite the success of recent works in dealing with inherent nonstationarity and sample complexity, it remains difficult to generalize to unseen environments and to transfer different layers of the policy to other agents. In this paper, we propose a novel HRL architecture, Hierarchical Decompositional Reinforcement Learning (HiDe), which allows decomposition of the hierarchical layers into independent subtasks, yet allows for joint training of all layers in end-to-end manner. The main insight is to combine a control policy on a lower level with an image-based planning policy on a higher level. We evaluate our method on various complex continuous control tasks, demonstrating that generalization across environments and transfer of higher level policies, such as from a simple ball to a complex humanoid, can be achieved. See videos https://sites.google.com/view/hide-rl.

agent, learning functionally decomposed hierarchy, planning layer, (13 more...)

2002.05954

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningFeb-13-2020

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Ren, Yangang, Duan, Jingliang, Guan, Yang, Li, Shengbo Eben

Reinforcement learning (RL) has achieved remarkable performance in a variety of sequential decision making and control tasks. However, a common problem is that learned nearly optimal policy always overfits to the training environment and may not be extended to situations never encountered during training. For practical applications, the randomness of the environment usually leads to rare but devastating events, which should be the focus of safety-critical systems, such as autonomous driving. In this paper, we introduce the minimax formulation and distributional framework to improve the generalization ability of RL algorithms and develop the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm. Minimax formulation aims to seek optimal policy considering the most serious disturbances from environment, in which the protagonist policy maximizes action-value function while the adversary policy tries to minimize it. Distributional framework aims to learn a state-action return distribution, from which we can model the risk of different returns explicitly, thus, formulating a risk-averse protagonist policy and a risk-seeking adversarial policy. We implement our method on the decision-making tasks of autonomous vehicles at intersections and test the trained policy in distinct environments from training environment. Results demonstrate that our method can greatly improve the generalization ability of the protagonist agent to different environmental variations.

algorithm, return distribution, vehicle, (12 more...)

arXiv.org Machine Learning

2002.05502

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation (0.35)
Information Technology (0.35)
Automobiles & Trucks (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Pan, Yangchen, Mei, Jincheng, Farahmand, Amir-massoud

Frequency-based Search-control in Dyna

arXiv.org Artificial IntelligenceFeb-13-2020

Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

conference paper, international conference, value function, (14 more...)

2002.05822

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > Alberta (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Stein, Anthony, Maier, Roland, Rosenbauer, Lukas, Hähner, Jörg

XCS Classifier System with Experience Replay

arXiv.org Artificial IntelligenceFeb-13-2020

XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks -- among which the Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage stored raw experiences encountered so far. To bridge this gap, this paper investigates the benefits of extending XCS with ER. On the one hand, we demonstrate that for single-step tasks ER bears massive potential for improvements in terms of sample efficiency. On the shady side, however, we reveal that the use of ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains.

classifier system, macroclassifier mean 0, mean 0, (15 more...)

2002.05628

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.30)

Industry:

Education (0.93)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)