Reinforcement Learning
An Introduction to Intertask Transfer for Reinforcement Learning
This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difficult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area. However, if agents are to behave intelligently in complex, dynamic, and noisy environments, we believe that they must be able to learn and adapt. The reinforcement learning (RL) paradigm is a popular way for such agents to learn from experience with minimal feedback. One of the central questions in RL is how best to generalize knowledge to successfully learn and adapt.
A Review of Reinforcement Learning
There's a great new book on the market that lays out the conceptual and algorithmic foundations of this exciting area. Reinforcement learning pioneers Rich Sutton and Andy Barto have published Reinforcement Learning: An Introduction, providing a highly accessible starting point for interested students, researchers, and practitioners. In the reinforcement learning framework, an agent acts in an environment whose state it can sense and occasionally receives some penalty or reward based on its state and action. Its learning task is to find a policy for action selection that maximizes its reward over the long haul; this task requires not only choosing actions that are associated with high reward in the current state but thinking ahead by choosing actions that will lead the agents to more lucrative parts of the state space. Although there are many ways to attack this problem, the paradigm described in the book is to construct a value function that evaluates the "goodness" of different situations.
Deep Reinforcement Learning based Optimal Control of Hot Water Systems
Kazmi, Hussain, Mehmood, Fahad, Lodeweyckx, Stefan, Driesen, Johan
Energy consumption for hot water production is a major draw in high efficiency buildings. Optimizing this has typically been approached from a thermodynamics perspective, decoupled from occupant influence. Furthermore, optimization usually presupposes existence of a detailed dynamics model for the hot water system. These assumptions lead to suboptimal energy efficiency in the real world. In this paper, we present a novel reinforcement learning based methodology which optimizes hot water production. The proposed methodology is completely generalizable, and does not require an offline step or human domain knowledge to build a model for the hot water vessel or the heating element. Occupant preferences too are learnt on the fly. The proposed system is applied to a set of 32 houses in the Netherlands where it reduces energy consumption for hot water production by roughly 20% with no loss of occupant comfort. Extrapolating, this translates to absolute savings of roughly 200 kWh for a single household on an annual basis. This performance can be replicated to any domestic hot water system and optimization objective, given that the fairly minimal requirements on sensor data are met. With millions of hot water systems operational worldwide, the proposed framework has the potential to reduce energy consumption in existing and new systems on a multi Gigawatt-hour scale in the years to come.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja, Tuomas, Zhou, Aurick, Abbeel, Pieter, Levine, Sergey
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy - that is, succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
geek-ai/MAgent
MAgent is a research platform for many-agent reinforcement learning. Unlike previous research platforms that focus on reinforcement learning research with a single agent or only few agents, MAgent aims at supporting reinforcement learning research that scales up from hundreds to millions of agents. MAgent supports Linux and OS X running Python 2.7 or python 3. We make no assumptions about the structure of your agents. You can write rule-based algorithms or use deep learning frameworks. The training time of following tasks is about 1 day on a GTX1080-Ti card.
Artificial Intelligence Newsletter - O'Reilly Media
Simulators, such as digital twins, which allow developers to speed up the development of AI systems, along with the reinforcement learning libraries that integrate with them (The RL library that's part of RISELab's Ray is a great example.) Developer tools for building AI applications that can process multimodal inputs Tools that target developers who aren't data engineers or data scientists Simulators, such as digital twins, which allow developers to speed up the development of AI systems, along with the reinforcement learning libraries that integrate with them (The RL library that's part of RISELab's Ray is a great example.)
[P] Pricing strategy using reinforcement learning โข r/MachineLearning
I really like your write up on this. As someone that does pricing for saas, I would say that if you look beyond theoretical application of this you will run into the customer perception problems you listed (changing prices, and especially fairness if people are being charged differently). There may be ways you can obfuscate these price differentials through opaque marketplaces or private contracts. This is easiest on small transactions with understood variability (aws spot pricing, etc). I would also add that value based pricing doesn't ignore competition, it's just very hard to assess value regardless of direct competition unless you are dealing in commodities.
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Dann, Christoph, Lattimore, Tor, Brunskill, Emma
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.
endgameinc/gym-malware
This is a malware manipulation environment for OpenAI's gym. OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This makes it possible to write agents that learn to manipulate PE files (e.g., malware) to achieve some objective (e.g., bypass AV) based on a reward provided by taking specific manipulation actions. Create an AI that learns through reinforcement learning which functionality-preserving transformations to make on a malware sample to break through / bypass machine learning static-analysis malware detection. There are two basic concepts in reinforcement learning: the environment (in our case, the malware sample) and the agent (namely, the algorithm used to change the environment). The agent sends actions to the environment, and the environment replies with observations and rewards (that is, a score).
Adversarial Learning for Good: My Talk at #34c3 on Deep Learning Blindspots
When I first was introduced to the idea of adversarial learning for security purposes by Clarence Chio's 2016 DEF CON talk and his related open-source library deep-pwning, I immediately started wondering about applications of the field to both make robust and well-tested models, but also as a preventative measure against predatory machine learning practices in the field. After reading more literature and utilizing several other open-source libraries, I realized most examples and research focused around malicious uses, such as sending spam or malware without detection, or crashing self-driving cars. Although I find this research interesting, I wanted to determine if adversarial learning could be used for "good".1 In case you haven't been following the explosion of adversarial learning in neural network research, papers and conferences, let's take a whirlwind tour of some concepts to get on the same page and provide further reading if you open up arXiv for fun on the weekend. Similarly to how we use the loss function to train our network, researchers found we can use this same method to find weak links in our network and adversarial examples that exploit them.