Goto

Collaborating Authors

 Reinforcement Learning


Giving Up Control: Neurons as Reinforcement Learning Agents

arXiv.org Artificial Intelligence

Artificial Intelligence has historically relied on planning, heuristics, and handcrafted approaches designed by experts. All the while claiming to pursue the creation of Intelligence. This approach fails to acknowledge that intelligence emerges from the dynamics within a complex system. Neurons in the brain are governed by local rules, where no single neuron, or group of neurons, coordinates or controls the others. This local structure gives rise to the appropriate dynamics in which intelligence can emerge. Populations of neurons must compete with their neighbors for resources, inhibition, and activity representation. At the same time, they must cooperate, so the population and organism can perform high-level functions. To this end, we introduce modeling neurons as reinforcement learning agents. Where each neuron may be viewed as an independent actor, trying to maximize its own self-interest. By framing learning in this way, we open the door to an entirely new approach to building intelligent systems.


The 10 Best Free Online Artificial Intelligence And Machine Learning Courses For 2020

#artificialintelligence

The demand for people with knowledge and skills in artificial intelligence (AI) and machine learning (ML) hugely outstrips the supply. This means that learning and gaining qualifications in these subjects can be a great way to enhance your career prospects. However, not everyone has the spare time and money to spend years studying for a degree or other formal qualifications. Today, with the wealth of freely available educational content online, it may not be necessary. There are so many courses, tutorials, and guides available online that it is perfectly possible to gain a thorough grounding in these subjects without paying a penny.


Reinforcement Learning Industry Applications -Soulpage IT

#artificialintelligence

The growth of reinforcement learning as a tool of machine learning has been the last option in industry usage to solve complex problems that are rising every day with changing market dynamics. With increasing intelligent systems, reinforcement learning for industrial applications is unfolding advanced intelligent solutions to tackle complex problems. Industries are significantly realizing the importance of reinforcement learning in their operations which help them in for being more customer-centric. The future goal of industries using reinforcement learning would be a 100 % return on investment. Please connect with us if you resonate with our article.


Reinforcement Learning for Electricity Network Operation

arXiv.org Machine Learning

The goal of this challenge is to test the potential of Reinforcement Learning (RL) to control electrical power transmission, in the most cost-effective manner, while keeping people and equipment safe from harm. Solving this challenge may have very positive impacts on society, as governments move to decarbonize the electricity sector and to electrify other sectors, to help reach IPCC climate goals. Existing software, computational methods and optimal powerflow solvers are not adequate for real-time network operations on short temporal horizons in a reasonable computational time. With recent changes in electricity generation and consumption patterns, system operation is moving to become more of a stochastic rather than a deterministic control problem. In order to overcome these complexities, new computational methods are required. The intention of this challenge is to explore RL as a solution method for electricity network control. There may be under-utilized, cost-effective flexibility in the power network that RL techniques can identify and capitalize on, that human operators and traditional solution techniques are unaware of or unaccustomed to. An RL agent that can act in conjunction, or in parallel with human network operators, will optimize grid security and reliability, allowing more renewable resources to be connected while minimizing the cost and maintaining supply to customers, and preventing damage to electrical equipment. Another aim of the project is to broaden the audience for the problem of electricity network control and to foster collaboration between experts in both the power systems community and the wider RL/ML community.


Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

arXiv.org Machine Learning

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.


DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

arXiv.org Machine Learning

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard supervised methods (e.g., for bandits), on-policy data collection provides "hard negatives" that correct the model in precisely those states and actions that the policy is likely to visit. We call this phenomenon "corrective feedback." We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function. In fact, Q-learning and related methods can exhibit pathological interactions between the distribution of experience collected by the agent and the policy induced by training on that experience, leading to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards. We demonstrate the existence of this problem, both theoretically and empirically. We then show that a specific correction to the data distribution can mitigate this issue. Based on these observations, we propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training, resulting in substantial improvements in a range of challenging RL settings, such as multi-task learning and learning from noisy reward signals. Blog post presenting a summary of this work is available at: https://bair.berkeley.edu/blog/2020/03/16/discor/.


Value Variance Minimization for Learning Approximate Equilibrium in Aggregation Systems

arXiv.org Machine Learning

For effective matching of resources (e.g., taxis, food, bikes, shopping items) to customer demand, aggregation systems have been extremely successful. In aggregation systems, a central entity (e.g., Uber, Food Panda, Ofo) aggregates supply (e.g., drivers, delivery personnel) and matches demand to supply on a continuous basis (sequential decisions). Due to the objective of the central entity to maximize its profits, individual suppliers get sacrificed thereby creating incentive for individuals to leave the system. In this paper, we consider the problem of learning approximate equilibrium solutions (win-win solutions) in aggregation systems, so that individuals have an incentive to remain in the aggregation system. Unfortunately, such systems have thousands of agents and have to consider demand uncertainty and the underlying problem is a (Partially Observable) Stochastic Game. Given the significant complexity of learning or planning in a stochastic game, we make three key contributions: (a) To exploit infinitesimally small contribution of each agent and anonymity (reward and transitions between agents are dependent on agent counts) in interactions, we represent this as a Multi-Agent Reinforcement Learning (MARL) problem that builds on insights from non-atomic congestion games model; (b) We provide a novel variance reduction mechanism for moving joint solution towards Nash Equilibrium that exploits the infinitesimally small contribution of each agent; and finally (c) We provide detailed results on three different domains to demonstrate the utility of our approach in comparison to state-of-the-art methods.


Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks

arXiv.org Artificial Intelligence

Reinforcement learning systems require good representations to work well. For decades practical success in reinforcement learning was limited to small domains. Deep reinforcement learning systems, on the other hand, are scalable, not dependent on domain specific prior knowledge and have been successfully used to play Atari, in 3D navigation from pixels, and to control high degree of freedom robots. Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices. Even well tuned systems exhibit significant instability both within a trial and across experiment replications. In practice, significant expertise and trial and error are usually required to achieve good performance. One potential source of the problem is known as catastrophic interference: when later training decreases performance by overriding previous learning. Interestingly, the powerful generalization that makes Neural Networks (NN) so effective in batch supervised learning might explain the challenges when applying them in reinforcement learning tasks. In this paper, we explore how online NN training and interference interact in reinforcement learning. We find that simply re-mapping the input observations to a high-dimensional space improves learning speed and parameter sensitivity. We also show this preprocessing reduces interference in prediction tasks. More practically, we provide a simple approach to NN training that is easy to implement, and requires little additional computation. We demonstrate that our approach improves performance in both prediction and control with an extensive batch of experiments in classic control domains.


Data For Science Sunday Briefing

#artificialintelligence

Dear friends, Welcome to the March 15 edition of the Sunday Briefing. This week we take advantage of the increased world wide attention on the current SARS-CoV-2 pandemic and take a deep look at how we might soon be looking at 1 Million Infected Americans, how we can strengthen the use of models during epidemics, and at The effect of travel restrictions. In addition, we also take another look at Simpson's Paradox, Data Trees, a new dataset on grocery purchases in London, analyze how to uncover patterns in high dimensional time series, have an overview of Causal Interpretability for ML and Deep Reinforcement Learning For Trading. Finally, in our video of the week, the good folks over at 3Blue1Brown give us an excellent overview of Exponential growth and epidemics to help us to better interpret the numbers we see in the news. Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, just go ahead and forward this email to them and help us spread the word!


Particle-Based Adaptive Discretization for Continuous Control using Deep Reinforcement Learning

arXiv.org Machine Learning

Learning controls in high-dimensional continuous action spaces, such as controlling the movements of highly articulated agents and robots, has long been a standing challenge to model-free deep reinforcement learning (DRL). In this paper we propose a general, yet simple, framework for improving the action exploration of policy gradient DRL algorithms. Our approach adapts ideas from the particle filtering literature to dynamically discretize the continuous action space and track policies represented as a mixture of Gaussians. We demonstrate the applicability of our approach on state-of-the-art DRL baselines in challenging high-dimensional motor tasks involving articulated agents. We show that our adaptive particle-based discretization leads to improved final performance and speed of convergence as compared to uniform discretization schemes and to corresponding implementations in continuous action spaces, highlighting the importance of exploration. In addition, the resulting policies are more stable, exhibiting less variance across different training trials.