Goto

Collaborating Authors

 Reinforcement Learning


Reinforcement Learning: a Subtle Introduction

#artificialintelligence

Reinforcement learning is a branch of Machine Learning and AI. It takes a very specific approach to creating models to do certain things. The objective of reinforcement learning is to teach a computer/machine to perform a certain task with a high degree of success. It is also important to note what reinforcement learning isn't. These models are artificial specific intelligence (ASIs), meaning they can only perform very specific tasks.


Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

arXiv.org Artificial Intelligence

We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a bias function $V$ of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when $V\equiv0$, but our scheme works best when $V$ is a known reasonably good approximation to the optimal cost function $J^*$. When $V$ is equal to the cost function $J_{\mu}$ of some known policy $\mu$ and there is only one aggregate state, our scheme is equivalent to the rollout algorithm based on $\mu$ (i.e., the result of a single policy improvement starting with the policy $\mu$). When $V=J_{\mu}$ and there are multiple aggregate states, our aggregation approach can be used as a more powerful form of improvement of $\mu$. Thus, when combined with an approximate policy evaluation scheme, our approach can form the basis for a new and enhanced form of approximate policy iteration. When $V$ is a generic bias function, our scheme is equivalent to approximation in value space with lookahead function equal to $V$ plus a local correction within each aggregate state. The local correction levels are obtained by solving a low-dimensional aggregate DP problem, yielding an arbitrarily close approximation to $J^*$, when the number of aggregate states is sufficiently large. Except for the bias function, the aggregate DP problem is similar to the one of the classical aggregation framework, and its algorithmic solution by simulation or other methods is nearly identical to one for classical aggregation, assuming values of $V$ are available when needed.


Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

arXiv.org Artificial Intelligence

Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. The agent needs to learn a latent representation together with a control policy to perform the task. Fitting a high-capacity encoder using a scarce reward signal is not only sample inefficient, but also prone to suboptimal convergence. Two ways to improve sample efficiency are to extract relevant features for the task and use off-policy algorithms. We dissect various approaches of learning good latent features, and conclude that the image reconstruction loss is the essential ingredient that enables efficient and stable representation learning in image-based RL. Following these findings, we devise an off-policy actor-critic algorithm with an auxiliary decoder that trains end-to-end and matches state-of-the-art performance across both model-free and model-based algorithms on many challenging control tasks. We release our code to encourage future research on image-based RL.


Google Accelerates Quantum Computation with Classical Machine Learning

#artificialintelligence

Tech giant Google's recent claim regarding quantum supremacy created a buzz in the computer science community and got global mainstream media talking about quantum computing breakthroughs. Yesterday Google fed the public's growing interest in the topic with a blog post introducing a study on improving quantum computation using classical machine learning. The qubit is the most basic constituent of quantum computing, and also poses one of the most significant challenges for the realization of near-term quantum computers. Various characteristics of qubits have made it challenging to control them. Google AI explains that issues such as imperfections in the control electronics can "impact the fidelity of the computation and thus limit the applications of near-term quantum devices."



Attention-based Fault-tolerant Approach for Multi-agent Reinforcement Learning Systems

arXiv.org Artificial Intelligence

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. In many real-world applications, the agents can only acquire a partial view of the world. However, in realistic settings, one or more agents that show arbitrarily faulty or malicious behavior may suffice to let the current coordination mechanisms fail. In this paper, we study a practical scenario considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. Under these circumstances, learning an optimal policy becomes particularly challenging, even in the unrealistic case that an agent's policy can be made conditional upon all other agents' observations. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) algorithm which selects correct and relevant information for each agent at every time-step. The multi-head attention mechanism enables the agents to learn effective communication policies through experience concurrently to the action policies. Empirical results have shown that FT-Attn beats previous state-of-the-art methods in some complex environments and can adapt to various kinds of noisy environments without tuning the complexity of the algorithm. Furthermore, FT-Attn can effectively deal with the complex situation where an agent needs to reach multiple agents' correct observation at the same time.


End-to-End Motion Planning of Quadrotors Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Separation of these tasks is the medium within the current state-of- the-art navigation methods. Each task is performed by an individual module and modularity is attained easily by this way. Nevertheless, modularity comes with the cost of possible incompatibility, especially with the presence of erroneous modules. An erroneous module in the pipeline could easily cause the other modules to fail as well. Therefore, in this work, the unification of these tasks is attempted within a single, reliable module using deep reinforcement learning (RL) [13]-[16].


Ready, Set, Algorithms! Teams Learn AI by Racing Cars

#artificialintelligence

The DeepRacer league, developed by Amazon Web Services, is designed to teach a branch of artificial intelligence known as reinforcement learning. Amazon Web Services (AWS) has developed the DeepRacer League, a competition designed to teach a branch of artificial intelligence (AI) known as reinforcement learning, in which algorithms learn the correct way to perform an action based on trial and error, and observations. As part of the DeepRacer League, teams or individuals build and train AI algorithms using Amazon SageMaker software, then deploy them to self-driving model cars measuring about 10 inches long, which they race around a track roughly 17 feet by 26 feet. Morningstar is one of the companies participating in the DeepRacer League, and thanks to the training, the company expects to have dozens of projects based on reinforcement learning and other machine learning techniques in deployment by the end of next year. AWS developed the DeepRacer program in an effort to teach software developers about machine learning in a more engaging way than reading scientific articles.


DeepMind Has Quietly Open Sourced Three New Impressive Reinforcement Learning Frameworks

#artificialintelligence

Deep reinforcement learning(DRL) has been at the center of some of the biggest breakthroughs of artificial intelligence(AI) in the last few years. However, despite all its progress, DRL methods remain incredibly difficult to apply in mainstream solutions given the lack of tooling and libraries. Consequently, DRL remains mostly a research activity that hasn't seen a lot of adoption into real world machine learning solutions. Addressing that problem requires better tools and frameworks. Among the current generation of artificial intelligence(AI) leaders, DeepMind stands alone as the company that has done the most to advance DRL research and development. Recently, the Alphabet subsidiary has been releasing a series of new open source technologies that can help to streamline the adoption of DRL methods.


If MaxEnt RL is the Answer, What is the Question?

arXiv.org Artificial Intelligence

Experimentally, it has been observed that humans and animals often make decisions that do not maximize their expected utility, but rather choose outcomes randomly, with probability proportional to expected utility. Probability matching, as this strategy is called, is equivalent to maximum entropy reinforcement learning (MaxEnt RL). However, MaxEnt RL does not optimize expected utility. In this paper, we formally show that MaxEnt RL does optimally solve certain classes of control problems with variability in the reward function. In particular, we show (1) that MaxEnt RL can be used to solve a certain class of POMDPs, and (2) that MaxEnt RL is equivalent to a two-player game where an adversary chooses the reward function. These results suggest a deeper connection between MaxEnt RL, robust control, and POMDPs, and provide insight for the types of problems for which we might expect MaxEnt RL to produce effective solutions. Specifically, our results suggest that domains with uncertainty in the task goal may be especially well-suited for MaxEnt RL methods.