Goto

Collaborating Authors

 Reinforcement Learning


Efficient Multi-robot Exploration via Multi-head Attention-based Cooperation Strategy

arXiv.org Artificial Intelligence

The goal of coordinated multi-robot exploration tasks is to employ a team of autonomous robots to explore an unknown environment as quickly as possible. Compared with human-designed methods, which began with heuristic and rule-based approaches, learning-based methods enable individual robots to learn sophisticated and hard-to-design cooperation strategies through deep reinforcement learning technologies. However, in decentralized multi-robot exploration tasks, learning-based algorithms are still far from being universally applicable to the continuous space due to the difficulties associated with area calculation and reward function designing; moreover, existing learning-based methods encounter problems when attempting to balance the historical trajectory issue and target area conflict problem. Furthermore, the scalability of these methods to a large number of agents is poor because of the exponential explosion problem of state space. Accordingly, this paper proposes a novel approach - Multi-head Attention-based Multi-robot Exploration in Continuous Space (MAMECS) - aimed at reducing the state space and automatically learning the cooperation strategies required for decentralized multi-robot exploration tasks in continuous space. Computational geometry knowledge is applied to describe the environment in continuous space and to design an improved reward function to ensure a superior exploration rate. Moreover, the multi-head attention mechanism employed helps to solve the historical trajectory issue in the decentralized multi-robot exploration task, as well as to reduce the quadratic increase of action space.


Worker robots that learn from mistakes

#artificialintelligence

Computer scientists at the University of Leeds are using the artificial intelligence (AI) techniques of automated planning and reinforcement learning to "train" a robot to find an object in a cluttered space, such as a warehouse shelf or in a fridge -- and move it. The aim is to develop robotic autonomy, so the machine can assess the unique circumstances presented in a task and find a solution -- akin to a robot transferring skills and knowledge to a new problem. The Leeds researchers are presenting their findings today (Monday, November 4) at the International Conference on Intelligent Robotics and Systems in Macau, China. The big challenge is that in a confined area, a robotic arm may not be able to grasp an object from above. Instead it has to plan a sequence of moves to reach the target object, perhaps by manipulating other items out of the way.


Automation via Reinforcement Learning

#artificialintelligence

The dream of reinforcement learning is that it can one day be used to derive automated solutions to real-world tasks, with little-to-no human effort1. Unfortunately, in its current state, RL fails to deliver. There have been basically no real-world problems solved by DRL; even on toy problems, the solutions found are often brittle and fail to generalize to new environments. This means that the per-task human effort โ€“ i.e. task-specific engineering effort and hyperparameter tuning โ€“ is quite high. Algorithms are sample-inefficient, making them expensive in terms of both data collection effort and compute effort, too.


Success Stories of Reinforcement Learning

#artificialintelligence

In September 2018, I got the opportunity to attend the Deep Learning Indaba conference that was held in Stellenbosch University, South Africa. Deep Learning Indaba was formed with an aim to strengthen African Machine Learning as well as to increase African participation and contribution to the advances in artificial intelligence and machine learning, and address issues of diversity in these fields of science. One of the lectures that I really enjoyed was on Success Stories of Reinforcement Learning where we got introduced to reinforcement learning as well as how it was used to build some pretty awesome computer programs. This lecture was presented by David Silver. Professor David Silver Leads the reinforcement learning research group at DeepMind which is an AI company based in London that was acquired by Google in 2014.


DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning

arXiv.org Artificial Intelligence

-- DeepRacer is a platform for end-to-end experimentation with RL and can be used to systematically investigate the key challenges in developing intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation and solution of a robust reinforcement learning algorithm, 2) narrowing the reality gap through joint perception and dynamics, 3) distributed on-demand compute architecture for training optimal policies, and 4) a robust evaluation method to identify when to stop training. It is the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning. Due to high sample complexity and safety requirements, it is common to train the RL agent in simulation [1], [5], [17]. To reduce training time and encourage exploration, the agent is usually trained with distributed rollouts [18], [19], [20], [21]. For a successful transfer to the real world, researchers use calibration [2], [22], domain randomization [23], [24], [25], [12], fine tuning with real world data [9], and learn features from a combination of simulation and real data [26], [27]. To experiment with robotic reinforcement learning, one needs to have expertise in many areas, access to a physical robot, an accurate robot model for simulations, a distributed training mechanism and customizability of the training procedure such as modifying the neural network and the loss function or introducing noise. For the uninitiated, dealing with this complexity is daunting and dissuades adoption. As a result, much of prior work is limited to a single robot [1], [23], [28] or a few robots [16]. We reduce the learning curve and alleviate development effort with DeepRacer.


Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

arXiv.org Artificial Intelligence

Being Optimistic to Be Conservative: Quickly Learning a CV aR Policy Ramtin Keramati 1, Christoph Dann 2, Alex T amkin 3, Emma Brunskill 3 1 Institute of Computational and Mathematical Engineering (ICME), Stanford University, California, USA 2 Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 3 Department of Computer Science, Stanford University, California, USA {keramati,atamkin,ebrun } @cs.stanford.edu Abstract While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CV aR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CV aR. In this paper, we present the first algorithm for sample-efficient learning of CV aR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CV aR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces. Introduction A key goal in reinforcement learning (RL) is to quickly learn to make good decisions by interacting with an environment. In most cases the quality of the decision policy is evaluated with respect to its expected (discounted) sum of rewards. However, in many interesting cases, it is important to consider the full distributions over the potential sum of rewards, and the desired objective may be a risk-sensitive measure of this distribution. For example, a patient undergoing a surgery for a knee replacement will (hopefully) only experience that procedure once or twice, and may will be interested in the distribution of potential results for a single procedure, rather than what may happen on average if he or she were to undertake that procedure hundreds of time. Finance and (machine) control are other cases where interest in risk-sensitive outcomes are common. A popular risk-sensitive measure of a distribution of outcomes is the Conditional V alue at Risk (CV aR) (Artzner et al. 1999). Intuitively, CV aR is the expected reward in the worst ฮฑ -fraction of outcomes, and has seen extensive use in financial portfolio optimization (Zhu and Fukushima 2009), often under the name "expected shortfall".


Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

arXiv.org Artificial Intelligence

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance.


Learning One-Shot Imitation from Humans without Humans

arXiv.org Artificial Intelligence

Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and natural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data.


Visiting the SOSP 2019 AI System Workshop

#artificialintelligence

The ACM Symposium on Operating Systems Principles (SOSP) has a long history and a great reputation in Operating Systems (OS) research. This year SOSP was held in Huntsville, a charming town located in lake country, some 200km north of Toronto. On a rainy Sunday, Synced visited Huntsville to check out the SOSP AI System Workshop. The growing and widespread deployment of AI has motivated OS researchers to develop novel system engineering for AI. The SOSP AI System Workshop explored these efforts to advance research in AI and operating systems.


An End-to-End Deep RL Framework for Task Arrangement in Crowdsourcing Platforms

arXiv.org Machine Learning

In this paper, we propose a Deep Reinforcement Learning (RL) framework for task arrangement, which is a critical problem for the success of crowdsourcing platforms. Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, the majority of them only consider the benefit of either workers or requesters independently. In addition, they cannot handle the dynamic environment and may produce sub-optimal results. To address these issues, we utilize Deep Q-Network (DQN), an RL-based method combined with a neural network to estimate the expected long-term return of recommending a task. DQN inherently considers the immediate and future reward simultaneously and can be updated in real-time to deal with evolving data and dynamic changes. Furthermore, we design two DQNs that capture the benefit of both workers and requesters and maximize the profit of the platform. To learn value functions in DQN effectively, we also propose novel state representations, carefully design the computation of Q values, and predict transition probabilities and future states. Experiments on synthetic and real datasets demonstrate the superior performance of our framework.