Goto

Collaborating Authors

 Reinforcement Learning


Machine Learning Practical: 6 Real-World Applications

#artificialintelligence

Free Coupon Discount - Machine Learning Practical: 6 Real-World Applications, Machine Learning - Get Your Hands Dirty by Solving Real Industry Challenges with Python Created by Kirill Eremenko, Hadelin de Ponteves, Dr. Ryan Ahmed, Ph.D., MBA, SuperDataScience Team, Rony Sulca Students also bought Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Deep Learning: GANs and Variational Autoencoders Artificial Intelligence: Reinforcement Learning in Python Natural Language Processing with Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Preview this Udemy Course GET COUPON CODE Description So you know the theory of Machine Learning and know how to create your first algorithms. There are tons of courses out there about the underlying theory of Machine Learning which don't go any deeper – into the applications. This course is not one of them. Are you ready to apply all of the theory and knowledge to real life Machine Learning challenges? We gathered best industry professionals with tons of completed projects behind.


Reinforcement learning and reasoning

#artificialintelligence

Reinforcement learning has seen a lot of progress in recent years. From DeepMind success with teaching machines how to play Atari games, then AlphaGo beating world champions in Go to recent OpenAI's progress on Dota 2, a multiplayer game where players divided into two teams compete with each other. The common thread is an artificial agent operating in a virtual world, where the prize is clear (e.g. On the other hand people are experimenting with AI agents operating in real-world. Each clip of Boston Dynamics gets a lot of press, showing robots performing amazing stunts, as you can see yourself here or here.


Disentangling causal effects for hierarchical reinforcement learning

arXiv.org Artificial Intelligence

Exploration and credit assignment under sparse rewards are still challenging problems. We argue that these challenges arise in part due to the intrinsic rigidity of operating at the level of actions. Actions can precisely define how to perform an activity but are ill-suited to describe what activity to perform. Instead, causal effects are inherently composable and temporally abstract, making them ideal for descriptive tasks. By leveraging a hierarchy of causal effects, this study aims to expedite the learning of task-specific behavior and aid exploration. Borrowing counterfactual and normality measures from causal literature, we disentangle controllable effects from effects caused by other dynamics of the environment. We propose CEHRL, a hierarchical method that models the distribution of controllable effects using a Variational Autoencoder. This distribution is used by a high-level policy to 1) explore the environment via random effect exploration so that novel effects are continuously discovered and learned, and to 2) learn task-specific behavior by prioritizing the effects that maximize a given reward function. In comparison to exploring with random actions, experimental results show that random effect exploration is a more efficient mechanism and that by assigning credit to few effects rather than many actions, CEHRL learns tasks more rapidly.


Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning

arXiv.org Machine Learning

Reinforcement learning (RL) and planning in Markov decision processes (MDPs) is one type of dynamic decisionmaking problem (Puterman, 1994; Bertsekas & Tsitsiklis, 1996; sut, 1998). While the typical objective is to maximize the expected cumulative reward, risk-aware decision-making has attracted attention in real-world applications, such as finance, robotics, and playing games (Geibel & Wysotzki, 2005; García & Fernández, 2015). The notion of risk in RL is related to the fact that even an optimal policy may perform poorly in some cases owing to the stochastic nature of the problem. To capture the risk, various criteria have been proposed, such as Value at Risk (Luenberger, 1998; Chow & Ghavamzadeh, 2014; Chow et al., 2017) and variance (Markowitz, 1952; Markowitz et al., 2000; Tamar et al., 2012; L.A. & Ghavamzadeh, 2013). Among them, we focus on the mean-variance tradeoff in RL problems. Typical mean-variance RL (MVRL) methods attempt to maximize the expected cumulative reward while maintaining the variance threshold (Tamar et al., 2012; L.A. & Ghavamzadeh, 2013; Prashanth & Ghavamzadeh, 2016; Xie et al., 2018; Bisi et al., 2020; Zhang et al., 2020). However, most existing MVRL methods suffer from high computational costs owing to the double sampling issue when approximating the gradient of the variance term (Tamar et al., 2012; L.A. & Ghavamzadeh, 2013; Prashanth & Ghavamzadeh, 2016). To avoid the double sampling issue, Xie et al. (2018) proposed a method based on the Legendre-Fenchel duality (Boyd & Vandenberghe, 2004). Although the method does not suffer from the double sampling issue, we cannot apply a standard policy gradient method and must use a coordinate descent algorithm.


Data-Efficient Reinforcement Learning with Self-Predictive Representations

arXiv.org Machine Learning

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https: //github.com/mila-iqia/spr.


Episodic Memory for Learning Subjective-Timescale Models

arXiv.org Artificial Intelligence

In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. Inspired by the recent works on human time perception, we devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale - over which it learns world dynamics and over which future planning is performed. We implement this in the framework of active inference and demonstrate that the resulting subjective-timescale model (STM) can systematically vary the temporal extent of its predictions while preserving the same computational efficiency. Additionally, we show that STM predictions are more likely to introduce future salient events (for example new objects coming into view), incentivising exploration of new areas of the environment. As a result, STM produces more informative action-conditioned roll-outs that assist the agent in making better decisions. We validate significant improvement in our STM agent's performance in the Animal-AI environment against a baseline system, trained using the environment's objective-timescale dynamics. An agent endowed with a model of its environment has the ability to predict the consequences of its actions and perform planning into the future before deciding on its next move. Models can allow agents to simulate the possible action-conditioned futures from their current state, even if the state was never visited during learning. As a result, model-based approaches can provide agents with better generalization abilities across both states and tasks in an environment, compared to their model-free counterparts (Racanière et al., 2017; Mishra et al., 2017).


Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

arXiv.org Artificial Intelligence

Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mujoban, where a robot needs to arrange 3D warehouses generated from 2D Sokoban puzzles. We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures. Unlike classic architectures that are typically model-based, we use only model-free modules trained with RL or supervised learning. We find that our modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mujoban. Further, learned modules can be reused when, e.g., using a different robot platform to solve the same task. Together our results give strong evidence for the importance of research into modular RL designs. Project website: https://sites.google.com/view/modular-rl/


A VR film/game with AI characters can be different every time you watch/play

#artificialintelligence

Gagliano previously won the first ever Emmy for a VR experience in 2015. Now he and producer David Oppenheim, who works at the National Film Board of Canada, are experimenting with a kind of storytelling they call dynamic film. "We see Agence as a sort of silent-era dynamic film," says Oppenheim. Agence was debuted at the Venice International Film Festival last month and was released this week to watch/play via Steam, an online video game platform. The basic plot revolves around a group of creatures and their appetite for a mysterious plant that appears on their planet: can they control their desire or will they destabilize the planet and get tipped to their doom?


Meta-Model-Based Meta-Policy Optimization

arXiv.org Machine Learning

Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and has demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model of an environment. Thus, its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that performance degradation can be suppressed by using branched meta-rollouts. On the basis of this theoretical analysis, we propose Meta-Model-based Meta-Policy Optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. We demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks.


Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop

arXiv.org Machine Learning

We study the problem of balancing effectiveness and efficiency in automated feature selection. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates to the best subset, but is usually inefficient. Can we bridge the gap between effectiveness and efficiency under automation? Motivated by this dilemma, we aim to develop a novel feature space navigation method. In our preliminary work, we leveraged interactive reinforcement learning to accelerate feature selection by external trainer-agent interaction. In this journal version, we propose a novel interactive and closed-loop architecture to simultaneously model interactive reinforcement learning (IRL) and decision tree feedback (DTF). Specifically, IRL is to create an interactive feature selection loop and DTF is to feed structured feature knowledge back to the loop. First, the tree-structured feature hierarchy from decision tree is leveraged to improve state representation. In particular, we represent the selected feature subset as an undirected graph of feature-feature correlations and a directed tree of decision features. We propose a new embedding method capable of empowering graph convolutional network to jointly learn state representation from both the graph and the tree. Second, the tree-structured feature hierarchy is exploited to develop a new reward scheme. In particular, we personalize reward assignment of agents based on decision tree feature importance. In addition, observing agents' actions can be feedback, we devise another reward scheme, to weigh and assign reward based on the feature selected frequency ratio in historical action records. Finally, we present extensive experiments on real-world datasets to show the improved performance.