AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

[D] What are you currently 'stuck' on right now / these days? • r/MachineLearning

@machinelearnbotNov-18-2017, 13:50:21 GMT

Currently I'm searching for a Reinforcement Learning toolkit for autonomous driving to test the influence of several safety aspects during learning as a reward function. So far I have tested OpenAI Gym with the "Neon racer" environment, which does not provide those information. Are there any other toolkits you would suggest me for this purpose?

large language model, machine learning, reinforcement learning, (8 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.35)
(2 more...)

Add feedback

Object Manipulation Learning by Imitation

Zeng, Zhen, Kuipers, Benjamin

arXiv.org Artificial IntelligenceNov-18-2017

We aim to enable robot to learn object manipulation by imitation. Given external observations of demonstrations on object manipulations, we believe that two underlying problems to address in learning by imitation is 1) segment a given demonstration into skills that can be individually learned and reused, and 2) formulate the correct RL (Reinforcement Learning) problem that only considers the relevant aspects of each skill so that the policy for each skill can be effectively learned. Previous works made certain progress in this direction, but none has taken private information into account. The public information is the information that is available in the external observations of demonstration, and the private information is the information that are only available to the agent that executes the actions, such as tactile sensations. Our contribution is that we provide a method for the robot to automatically segment the demonstration of object manipulations into multiple skills, and formulate the correct RL problem for each skill, and automatically decide whether the private information is an important aspect of each skill based on interaction with the world. Our experiment shows that our robot learns to pick up a block, and stack it onto another block by imitating an observed demonstration. The evaluation is based on 1) whether the demonstration is reasonably segmented, 2) whether the correct RL problems are formulated, 3) and whether a good policy is learned.

machine learning, reinforcement learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

1603.00964

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Lecture 14 Deep Reinforcement Learning

#artificialintelligenceNov-16-2017, 21:58:01 GMT

In Lecture 14 we move from supervised learning to reinforcement learning (RL), in which an agent must learn to interact with an environment in order to maximize its reward. We discuss different algorithms for reinforcement learning including Q-Learning, policy gradients, and Actor-Critic. We show how deep reinforcement learning has been used to play Atari games and to achieve super-human Go performance in AlphaGo. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka "deep learning") approaches have greatly advanced the performance of these state-of-the-art visual recognition systems.

machine learning, reinforcement learning, stanford, (11 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.26)

Genre: Instructional Material > Course Syllabus & Notes (0.41)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Variational Adaptive-Newton Method for Explorative Learning

Khan, Mohammad Emtiyaz, Lin, Wu, Tangkaratt, Voot, Liu, Zuozhu, Nielsen, Didrik

arXiv.org Machine LearningNov-15-2017

We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution reveals that VAN is a second-order method that unifies existing methods in distinct fields of continuous optimization, variational inference, and evolution strategies. Our experimental results show that VAN performs well on a wide-variety of learning tasks. This work presents a general-purpose explorative-learning method that has the potential to improve learning in areas such as active learning and reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1711.0556

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.89)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
(2 more...)

Add feedback

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Teng, Diyan, Ertin, Emre

arXiv.org Machine LearningNov-15-2017

Sequential hypothesis testing is a desirable decision making strategy in any time sensitive scenario. Compared with fixed sample-size testing, sequential testing is capable of achieving identical probability of error requirements using less samples in average. For a binary detection problem, it is well known that for known density functions accumulating the likelihood ratio statistics is time optimal under a fixed error rate constraint. This paper considers the problem of learning a binary sequential detector from training samples when density functions are unavailable. We formulate the problem as a constrained likelihood ratio estimation which can be solved efficiently through convex optimization by imposing Reproducing Kernel Hilbert Space (RKHS) structure on the log-likelihood ratio function. In addition, we provide a computationally efficient approximated solution for large scale data set. The proposed algorithm, namely Wald-Kernel, is tested on a synthetic data set and two real world data sets, together with previous approaches for likelihood ratio estimation. Our empirical results show that the classifier trained through the proposed technique achieves smaller average sampling cost than previous approaches proposed in the literature for the same error rate.

classifier, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1508.07964

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
(2 more...)

Add feedback

A unified decision making framework for supply and demand management in microgrid networks

Diddigi, Raghuram Bharadwaj, Danda, Sai Koti Reddy, Narayanam, Krishnasuri, Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceNov-14-2017

This paper considers two important problems - on the supply-side and demand-side respectively and studies both in a unified framework. On the supply side, we study the problem of energy sharing among microgrids with the goal of maximizing profit obtained from selling power while meeting customer demand. On the other hand, under shortage of power, this problem becomes one of deciding the amount of power to be bought with dynamically varying prices. On the demand side, we consider the problem of optimally scheduling the time-adjustable demand - i.e., of loads with flexible time windows in which they can be scheduled. While previous works have treated these two problems in isolation, we combine these problems together and provide for the first time in the literature, a unified Markov decision process (MDP) framework for these problems. We then apply the Q-learning algorithm, a popular model-free reinforcement learning technique, to obtain the optimal policy. Through simulations, we show that our model outperforms the traditional power sharing models.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SmartGridComm.2018.8587514

1711.05078

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.40)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Advisor Reinforcement Learning

Laroche, Romain, Fatemi, Mehdi, Romoff, Joshua, van Seijen, Harm

arXiv.org Artificial IntelligenceNov-14-2017

We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefficient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical findings on a fruit collection task.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1704.00756

Genre:

Research Report (0.84)
Overview (0.66)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

Li, Yunzhu, Song, Jiaming, Ermon, Stefano

arXiv.org Artificial IntelligenceNov-14-2017

The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

1703.0884

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (0.94)
Leisure & Entertainment (0.68)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
(2 more...)

Add feedback

Reinforcement Learning Algorithm Selection

Laroche, Romain, Feraud, Raphael

arXiv.org Artificial IntelligenceNov-14-2017

The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1701.0881

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

REINFORCEjs: Gridworld with Dynamic Programming

#artificialintelligenceNov-13-2017, 23:16:26 GMT

Temporal Difference Learning Gridworld Demo // agent parameter spec to play with (this gets eval()'d on Agent reset) var spec {} spec.update This is a toy environment called **Gridworld** that is often used as a toy model in the Reinforcement Learning literature. In this particular case: - **State space**: GridWorld has 10x10 100 distinct states. The start state is the top left cell. The gray cells are walls and cannot be moved to. In this example - **Environment Dynamics**: GridWorld is deterministic, leading to the same new state given each state and action - **Rewards**: The agent receives 1 reward when it is in the center square (the one that shows R 1.0), and -1 reward in a few states (R -1.0 is shown for these).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback