AITopics

1905.02825

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Santucci, Vieri Giuliano, Cartoni, Emilio, da Silva, Bruno Castro, Baldassarre, Gianluca

Autonomous Open-Ended Learning of Interdependent Tasks

arXiv.org Artificial IntelligenceMay-7-2019

Autonomy is fundamental for artificial agents acting in complex real-world scenarios. The acquisition of many different skills is pivotal to foster versatile autonomous behaviour and thus a main objective for robotics and machine learning. Intrinsic motivations have proven to properly generate a task-agnostic signal to drive the autonomous acquisition of multiple policies in settings requiring the learning of multiple tasks. However, in real-world scenarios tasks may be interdependent so that some of them may constitute the precondition for learning other ones. Despite different strategies have been used to tackle the acquisition of interdependent/hierarchical tasks, fully autonomous open-ended learning in these scenarios is still an open question. Building on previous research within the framework of intrinsically-motivated open-ended learning, we propose an architecture for robot control that tackles this problem from the point of view of decision making, i.e. treating the selection of tasks as a Markov Decision Process where the system selects the policies to be trained in order to maximise its competence over all the tasks. The system is then tested with a humanoid robot solving interdependent multiple reaching tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1905.0269

Country:

Europe (0.46)
South America > Brazil > Rio Grande do Sul (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Sorokin, Artyom Y., Burtsev, Mikhail S.

Continual and Multi-task Reinforcement Learning With Shared Episodic Memory

arXiv.org Artificial IntelligenceMay-7-2019

Episodic memory plays an important role in the behavior of animals and humans. It allows the accumulation of information about current state of the environment in a task-agnostic way. This episodic representation can be later accessed by down-stream tasks in order to make their execution more efficient. In this work, we introduce the neural architecture with shared episodic memory (SEM) for learning and the sequential execution of multiple tasks. We explicitly split the encoding of episodic memory and task-specific memory into separate recurrent sub-networks. An agent augmented with SEM was able to effectively reuse episodic knowledge collected during other tasks to improve its policy on a current task in the Taxi problem. Repeated use of episodic representation in continual learning experiments facilitated acquisition of novel skills in the same environment.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

1905.02662

Genre: Research Report (0.82)

Industry:

Health & Medicine > Consumer Health (1.00)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Hoel, Carl-Johan, Driggs-Campbell, Katherine, Wolff, Krister, Laine, Leo, Kochenderfer, Mykel J.

Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving

arXiv.org Artificial IntelligenceMay-6-2019

Tactical decision making for autonomous driving is challenging due to the diversity of environments, the uncertainty in the sensor information, and the complex interaction with other road users. This paper introduces a general framework for tactical decision making, which combines the concepts of planning and learning, in the form of Monte Carlo tree search and deep reinforcement learning. The method is based on the AlphaGo Zero algorithm, which is extended to a domain with a continuous state space where self-play cannot be used. The framework is applied to two different highway driving cases in a simulated environment and it is shown to perform better than a commonly used baseline method. The strength of combining planning and learning is also illustrated by a comparison to using the Monte Carlo tree search or the neural network policy separately.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1905.0268

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment (1.00)
Government > Military (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Zap, Alexander, Joppen, Tobias, Fürnkranz, Johannes

Deep Ordinal Reinforcement Learning

arXiv.org Machine LearningMay-6-2019

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1905.02005

Country: Europe > Germany (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Machine LearningMay-6-2019

Collaborative Evolutionary Reinforcement Learning

Khadka, Shauharda, Majumdar, Somdeb, Nassar, Tarek, Dwiel, Zach, Tumer, Evren, Miret, Santiago, Liu, Yinyin, Tumer, Kagan

Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.

learner, machine learning, reinforcement learning, (14 more...)

1905.00976

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hafez, Muhammad Burhan, Weber, Cornelius, Kerzel, Matthias, Wermter, Stefan

Curious Meta-Controller: Adaptive Alternation between Model-Based and Model-Free Control in Deep Reinforcement Learning

arXiv.org Machine LearningMay-5-2019

Recent success in deep reinforcement learning for continuous control has been dominated by model-free approaches which, unlike model-based approaches, do not suffer from representational limitations in making assumptions about the world dynamics and model errors inevitable in complex domains. However, they require a lot of experiences compared to model-based approaches that are typically more sample-efficient. We propose to combine the benefits of the two approaches by presenting an integrated approach called Curious Meta-Controller. Our approach alternates adaptively between model-based and model-free control using a curiosity feedback based on the learning progress of a neural model of the dynamics in a learned latent space. We demonstrate that our approach can significantly improve the sample efficiency and achieve near-optimal performance on learning robotic reaching and grasping tasks from raw-pixel input in both dense and sparse reward settings.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1905.01718

Country: Europe > Germany > Hamburg (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Francois-Lavet, Vincent, Rabusseau, Guillaume, Pineau, Joelle, Ernst, Damien, Fonteneau, Raphael

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Journal of Artificial Intelligence ResearchMay-5-2019

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

overfitting and asymptotic bias, pomdp, state representation, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11478

AI Access Foundation

11478

Journal of Artificial Intelligence Research

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Alberta (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(2 more...)

Genre: Research Report (0.67)

Industry:

Energy > Power Industry (0.68)
Energy > Renewable > Hydrogen (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Fakoor, Rasool, Chaudhari, Pratik, Smola, Alexander J.

P3O: Policy-on Policy-off Policy Optimization

arXiv.org Machine LearningMay-5-2019

On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interleaves off-policy updates with on-policy updates. P3O uses the effective sample size between the behavior policy and the target policy to control how far they can be from each other and does not introduce any additional hyper-parameters. Extensive experiments on the Atari-2600 and MuJoCo benchmark suites show that this simple technique is highly effective in reducing the sample complexity of state-of-the-art algorithms.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1905.01756

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Dwiel, Zach, Candadai, Madhavun, Phielipp, Mariano J., Bansal, Arjun K.

Hierarchical Policy Learning is Sensitive to Goal Space Design

arXiv.org Artificial IntelligenceMay-4-2019

Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of optimal goal spaces still remain unknown and consequently there is no method yet to learn optimal goal spaces. Motivated by this, we systematically analyze how various modifications to the ground-truth goal-space affect learning in hierarchical models with the aim of identifying important properties of optimal goal spaces. Our results show that, while rotation of ground-truth goal spaces and noise had no effect, having additional unnecessary factors significantly impaired learning in hierarchical models.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1905.01537

Country: North America > United States (0.69)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)