Goto

Collaborating Authors

 Reinforcement Learning


Robots learn to get back up after a fall in an unfamiliar environment

New Scientist - News

Robots can pick themselves up after a fall, even in an unfamiliar environment, thanks to an artificially intelligent controller that can adapt to new scenarios. It could make four-legged robots more useful in responding to natural disasters, such as earthquakes. Zhibin (Alex) Li at the University of Edinburgh, UK and his colleagues used an AI technique called deep reinforcement learning to teach four-legged robots a set of basic skills, such as trotting, steering and fall recovery. This involves the robots experimenting with different ways of moving and being rewarded with a numerical score for achieving a certain goal, such as standing up after a fall, and penalised for failing. This lets the AI recognise which actions are desired and repeat them in the similar situations in the future.


Robots learn to get back up after a fall in an unfamiliar environment

New Scientist

Robots can pick themselves up after a fall, even in an unfamiliar environment, thanks to an artificially intelligent controller that can adapt to new scenarios. It could make four-legged robots more useful in responding to natural disasters, such as earthquakes. Zhibin (Alex) Li at the University of Edinburgh, UK and his colleagues used an AI technique called deep reinforcement learning to teach four-legged robots a set of basic skills, such as trotting, steering and fall recovery. This involves the robots experimenting with different ways of moving and being rewarded with a numerical score for achieving a certain goal, such as standing up after a fall, and penalised for failing. This lets the AI recognise which actions are desired and repeat them in the similar situations in the future.


NeurIPS: Shipra Agrawal on the appeal of reinforcement learning

#artificialintelligence

As deep neural networks have come to dominate AI, the Conference on Neural Information Processing Systems (NeurIPS) has become the most popular conference in the field. And at the most popular conference in the field, one of the most popular topics is reinforcement learning: at this year's NeurIPS, 95 accepted papers use the term in their titles. "Reinforcement learning is very, very powerful, because you can kind of learn anything, adaptively from the feedback, and by exploring the decision space," says Shipra Agrawal, an Amazon Scholar, an assistant professor in Columbia University's Industrial Engineering and Operations Research Department, and an area chair at NeurIPS, who studies reinforcement learning. "In concept, it's very akin to how humans learn, by trial and error, and how they adapt to what they see -- without requiring a loss function and so on, just by some kind of rewards or positive feedback." In reinforcement learning, an agent explores its environment, trying out different responses to different states of affairs, gradually learning a set of policies that will enable it to maximize some reward.


Deep Reinforcement Learning for Stock Portfolio Optimization

arXiv.org Artificial Intelligence

Stock portfolio optimization is the process of constant re-distribution of money to a pool of various stocks. In this paper, we will formulate the problem such that we can apply Reinforcement Learning for the task properly. To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well. On top of that, we will apply various state-of-the-art Deep Reinforcement Learning algorithms for comparison. Since the action space is continuous, the realistic formulation were tested under a family of state-of-the-art continuous policy gradients algorithms: Deep Deterministic Policy Gradient (DDPG), Generalized Deterministic Policy Gradient (GDPG) and Proximal Policy Optimization (PPO), where the former two perform much better than the last one. Next, we will present the end-to-end solution for the task with Minimum Variance Portfolio Theory for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern. Observations and hypothesis were discussed about the results, as well as possible future research directions.1


A Deep Reinforcement Learning Approach for Ramp Metering Based on Traffic Video Data

arXiv.org Artificial Intelligence

Ramp metering that uses traffic signals to regulate vehicle flows from the on-ramps has been widely implemented to improve vehicle mobility of the freeway. Previous studies generally update signal timings in real-time based on predefined traffic measures collected by point detectors, such as traffic volumes and occupancies. Comparing with point detectors, traffic cameras-which have been increasingly deployed on road networks-could cover larger areas and provide more detailed traffic information. In this work, we propose a deep reinforcement learning (DRL) method to explore the potential of traffic video data in improving the efficiency of ramp metering. The proposed method uses traffic video frames as inputs and learns the optimal control strategies directly from the high-dimensional visual inputs. A real-world case study demonstrates that, in comparison with a state-of-the-practice method, the proposed DRL method results in 1) lower travel times in the mainline, 2) shorter vehicle queues at the on-ramp, and 3) higher traffic flows downstream of the merging area. The results suggest that the proposed method is able to extract useful information from the video data for better ramp metering controls.


Deep Reinforcement Learning for Long Term Hydropower Production Scheduling

arXiv.org Artificial Intelligence

We explore the use of deep reinforcement learning to provide strategies for long term scheduling of hydropower production. We consider a use-case where the aim is to optimise the yearly revenue given week-by-week inflows to the reservoir and electricity prices. The challenge is to decide between immediate water release at the spot price of electricity and storing the water for later power production at an unknown price, given constraints on the system. We successfully train a soft actor-critic algorithm on a simplified scenario with historical data from the Nordic power market. The presented model is not ready to substitute traditional optimisation tools but demonstrates the complementary potential of reinforcement learning in the data-rich field of hydropower scheduling.


Interactive Search Based on Deep Reinforcement Learning

arXiv.org Artificial Intelligence

With the continuous development of machine learning technology, major e-commerce platforms have launched recommendation systems based on it to serve a large number of customers with different needs more efficiently. Compared with traditional supervised learning, reinforcement learning can better capture the user's state transition in the decision-making process, and consider a series of user actions, not just the static characteristics of the user at a certain moment. In theory, it will have a long-term perspective, producing a more effective recommendation. The special requirements of reinforcement learning for data make it need to rely on an offline virtual system for training. Our project mainly establishes a virtual user environment for offline training. At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.


An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.


Optimal oracle inequalities for solving projected fixed-point equations

arXiv.org Machine Learning

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.


AI Algorithm From Facebook Can Play Chess & Poker With Equal Ease

#artificialintelligence

In recent news, the research team at Facebook has introduced a general AI bot, ReBeL that can play both perfect information, such as chess and imperfect information games like poker with equal ease, using reinforcement learning. As the company says, it is a big step towards creating a general AI algorithm that could perform well over a range of games. The researchers believe that this algorithm will have real-world applications, including dealing with negotiations, fraud detection, and even cybersecurity. AlphaZero from DeepMind rapidly caught the fancy of the AI research community when it was released back in 2017. An AI-based program that could play games like chess, shogi, and Go is not unheard of, but AlphaZero is different as it uses reinforcement learning with search (RL Search) to'learn on its own' by mimicking the world-class players.