AITopics

1905.05787

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.92)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)
Health & Medicine > Therapeutic Area > Immunology (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Ramesh, Rahul, Tomar, Manan, Ravindran, Balaraman

Successor Options: An Option Discovery Framework for Reinforcement Learning

The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1905.05731

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Doerr, Andreas, Volpp, Michael, Toussaint, Marc, Trimpe, Sebastian, Daniel, Christian

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

estimator, exploration, surrogate model, (14 more...)

1905.0571

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Malazgirt, Gorker Alp, Unsal, Osman S., Kestelman, Adrian Cristal

TauRieL: Targeting Traveling Salesman Problem with a deep reinforcement learning inspired architecture

arXiv.org Artificial IntelligenceMay-14-2019

In this paper, we propose TauRieL and target Traveling Salesman Problem (TSP) since it has broad applicability in theoretical and applied sciences. TauRieL utilizes an actor-critic inspired architecture that adopts ordinary feedforward nets to obtain a policy update vector $v$. Then, we use $v$ to improve the state transition matrix from which we generate the policy. Also, the state transition matrix allows the solver to initialize from precomputed solutions such as nearest neighbors. In an online learning setting, TauRieL unifies the training and the search where it can generate near-optimal results in seconds. The input to the neural nets in the actor-critic architecture are raw 2-D inputs, and the design idea behind this decision is to keep neural nets relatively smaller than the architectures with wide embeddings with the tradeoff of omitting any distributed representations of the embeddings. Consequently, TauRieL generates TSP solutions two orders of magnitude faster per TSP instance as compared to state-of-the-art offline techniques with a performance impact of 6.1\% in the worst case.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1905.05567

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > Online (0.66)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.84)

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

Huang, Chen, Zhai, Shuangfei, Talbott, Walter, Bautista, Miguel Angel, Sun, Shih-Yu, Guestrin, Carlos, Susskind, Josh

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks. Importantly, our method is applicable to a wide range of loss functions and evaluation metrics. Furthermore, the learned policies are transferable across tasks and data, demonstrating the versatility of the method.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1905.05895

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Soemers, Dennis J. N. J., Piette, Éric, Stephenson, Matthew, Browne, Cameron

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1905.05809

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
(2 more...)

Manela, Binyamin, Biess, Armin

Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode. Virtual goals are randomly selected, irrespective of which are most instructive for the agent. In this paper, we present two improvements over the existing HER algorithm. First, we prioritize virtual goals from which the agent will learn more valuable information. We call this property the instructiveness of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we reduce existing bias in HER by the removal of misleading samples. To test our algorithms, we built two challenging environments with sparse reward functions. Our empirical results in both environments show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1905.05498

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

Son, Kyunghwan, Kim, Daewoo, Kang, Wan Ju, Hostallero, David Earl, Yi, Yung

We explore value-based solutions for multi-agent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently. However, VDN and QMIX are representative examples that use the idea of factorization of the joint action-value function into individual ones for decentralized execution. VDN and QMIX address only a fraction of factorizable MARL tasks due to their structural constraint in factorization such as additivity and monotonicity. In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions. QTRAN guarantees more general factorization than VDN or QMIX, thus covering a much wider class of MARL tasks than does previous methods. Our experiments for the tasks of multi-domain Gaussian-squeeze and modified predator-prey demonstrate QTRAN's superior performance with especially larger margins in games whose payoffs penalize non-cooperative behavior more aggressively.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

1905.05408

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.83)

#artificialintelligenceMay-13-2019, 09:10:04 GMT

Generalizable Deep Reinforcement Learning

Transfer learning is all the rage in the machine learning community these days. Transfer learning serves as the basis for many of the managed AutoML services that Google, Salesforce, IBM, and Azure provide. It now figures prominently in the latest NLP research -- appearing in Google's Bidirectional Encoder Representations from Transformers (BERT) model and in Sebastian Ruder and Jeremy Howard's Universal Language Model Fine-tuning for Text Classification (ULMFIT). As Sebastian writes in his blog post, 'NLP's ImageNet moment has arrived': We're also starting to see examples of neural networks that can handle multiple tasks using transfer learning across domains. Paras Chopra has an excellent tutorial for one PyTorch network that can conduct an image search based on a textual description, search for similar images and words, and write captions for images (link to his post below). The main question at hand is: could transfer learning have applications within reinforcement learning?

artificial intelligence, generalizable deep reinforcement learning, machine learning, (3 more...)

#artificialintelligence

Industry: Information Technology (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

RobohubMay-13-2019, 00:42:46 GMT

Robots that learn to adapt

Humans have the ability to seamlessly adapt to changes in their environments: adults can learn to walk on crutches in just a few seconds, people can adapt almost instantaneously to picking up an object that is unexpectedly heavy, and children who can walk on flat ground can quickly adapt their gait to walk uphill without having to relearn how to walk. This adaptation is critical for functioning in the real world. Robots, on the other hand, are typically deployed with a fixed behavior (be it hard-coded or learned), allowing them succeed in specific settings, but leading to failure in others: experiencing a system malfunction, encountering a new terrain or environment changes such as wind, or needing to cope with a payload or other unexpected perturbations. The idea behind our latest research is that the mismatch between predicted and observed recent states should inform the robot to update its model into one that more accurately describes the current situation. Noticing our car skidding on the road, for example, informs us that our actions are having a different effect than expected, and thus allows us to plan our consequent actions accordingly (Figure 1).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Robohub

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)