Reinforcement Learning
Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking
Sanders, Kate, Danielczuk, Michael, Mahler, Jeffrey, Tanwani, Ajay, Goldberg, Ken
A new generation of automated bin picking systems using deep learning is evolving to support increasing demand for e-commerce. To accommodate a wide variety of products, many automated systems include multiple gripper types and/or tool changers. However, for some objects, sequential grasp failures are common: when a computed grasp fails to lift and remove the object, the bin is often left unchanged; as the sensor input is consistent, the system retries the same grasp over and over, resulting in a significant reduction in mean successful picks per hour (MPPH). Based on an empirical study of sequential failures, we characterize a class of "sequential failure objects" (SFOs) -- objects prone to sequential failures based on a novel taxonomy. We then propose three non-Markov picking policies that incorporate memory of past failures to modify subsequent actions. Simulation experiments on SFO models and the EGAD dataset suggest that the non-Markov policies significantly outperform the Markov policy in terms of the sequential failure rate and MPPH. In physical experiments on 50 heaps of 12 SFOs the most effective Non-Markov policy increased MPPH over the Dex-Net Markov policy by 107%.
AI Returns The Favour: Implications Of Deep RL In Neuroscience
"This is a great opportunity to continue the synergistic virtuous circle' that has connected neuroscience and AI for decades." Artificial Neural networks occasionally get the bad rap for watering down the complexity of how a human brain works with over the top analogies. But, there is no denying the fact that popular algorithms were heavily inspired by how the natural systems work. Now, after three decades of innovation and inventions, AI as a domain has touched many functionalities of human cognition. From attention to memory to dreams, there is an active research space that is burgeoning with every passing day. Now a team of researchers at DeepMind are exploring the possibility of reverse engineering the results of algorithms to know more about cognitive functions.
Reinforcement Communication Learning in Different Social Network Structures
Dubova, Marina, Moskvichev, Arseny, Goldstone, Robert
Social network structure is one of the key determinants of human language evolution. Previous work has shown that the network of social interactions shapes decentralized learning in human groups, leading to the emergence of different kinds of communicative conventions. We examined the effects of social network organization on the properties of communication systems emerging in decentralized, multi-agent reinforcement learning communities. We found that the global connectivity of a social network drives the convergence of populations on shared and symmetric communication systems, preventing the agents from forming many local "dialects". Moreover, the agent's degree is inversely related to the consistency of its use of communicative conventions. These results show the importance of the basic properties of social network structure on reinforcement communication learning and suggest a new interpretation of findings on human convergence on word conventions.
On Controllability of AI
The unprecedented progress in Artificial Intelligence (AI) [1-6], over the last decade, came alongside of multiple AI failures [7, 8] and cases of dual use [9] causing a realization [10] that it is not sufficient to create highly capable machines, but that it is even more important to make sure that intelligent machines are beneficial [11] for the humanity. This lead to the birth of the new subfield of research commonly known as AI Safety and Security [12] with hundreds of papers and books published annually on different aspects of the problem [13-31]. All such research is done under the assumption that the problem of controlling highly capable intelligent machines is solvable, which has not been established by any rigorous means. However, it is a standard practice in computer science to first show that a problem doesn't belong to a class of unsolvable problems [32, 33] before investing resources into trying to solve it or deciding what approaches to try. Unfortunately, to the best of our knowledge no mathematical proof or even rigorous argumentation has been published demonstrating that the AI control problem may be solvable, even in principle, much less in practice. Or as Gans puts it citing Bostrom: "Thusfar, AI researchers and philosophers have not been able to come up with methods of control that would ensure [bad] outcomes did not take place …" [34].
Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities
Mei, Jincheng, Pan, Yangchen, White, Martha, Farahmand, Amir-massoud, Yao, Hengshuai
Model-based reinforcement learning (MBRL) can significantly improve sample efficiency, particularly when carefully choosing the states from which to sample hypothetical transitions. Such prioritization has been empirically shown to be useful for both experience replay (ER) and Dyna-style planning. However, there is as yet little theoretical understanding in RL about such prioritization strategies, and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show an equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This ideal setting, however, cannot be realized in practice, due to insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the distribution of hypothetical experiences generated from these states provides a diverse set of states, sampled proportionally to approximately true priorities. Our experiments on both benchmark and application-oriented domain show that our approach achieves superior performance over both the model-free prioritized ER method and several closely related model-based baselines.
Quick Question: Interrupting Users for Microtasks with Reinforcement Learning
Ho, Bo-Jhang, Balaji, Bharathan, Koseoglu, Mehmet, Sandha, Sandeep, Pei, Siyou, Srivastava, Mani
Human attention is a scarce resource in modern computing. A multitude of microtasks vie for user attention to crowdsource information, perform momentary assessments, personalize services, and execute actions with a single touch. A lot gets done when these tasks take up the invisible free moments of the day. However, an interruption at an inappropriate time degrades productivity and causes annoyance. Prior works have exploited contextual cues and behavioral data to identify interruptibility for microtasks with much success. With Quick Question, we explore use of reinforcement learning (RL) to schedule microtasks while minimizing user annoyance and compare its performance with supervised learning. We model the problem as a Markov decision process and use Advantage Actor Critic algorithm to identify interruptible moments based on context and history of user interactions. In our 5-week, 30-participant study, we compare the proposed RL algorithm against supervised learning methods. While the mean number of responses between both methods is commensurate, RL is more effective at avoiding dismissal of notifications and improves user experience over time.
Same-Day Delivery with Fairness
Chen, Xinwei, Wang, Tong, Thomas, Barrett W., Ulmer, Marlin W.
The demand for same-day delivery (SDD) has increased rapidly in the last few years and has particularly boomed during the COVID-19 pandemic. Existing literature on the problem has focused on maximizing the utility, represented as the total number of expected requests served. However, a utility-driven solution results in unequal opportunities for customers to receive delivery service, raising questions about fairness. In this paper, we study the problem of achieving fairness in SDD. We construct a regional-level fairness constraint that ensures customers from different regions have an equal chance of being served. We develop a reinforcement learning model to learn policies that focus on both overall utility and fairness. Experimental results demonstrate the ability of our approach to mitigate the unfairness caused by geographic differences and constraints of resources, at both coarser and finer-grained level and with a small cost to utility. In addition, we simulate a real-world situation where the system is suddenly overwhelmed by a surge of requests, mimicking the COVID-19 scenario. Our model is robust to the systematic pressure and is able to maintain fairness with little compromise to the utility.
Multi-Principal Assistance Games
Fickinger, Arnaud, Zhuang, Simon, Hadfield-Menell, Dylan, Russell, Stuart
Assistance games (also known as cooperative inverse reinforcement learning games) have been proposed as a model for beneficial AI, wherein a robotic agent must act on behalf of a human principal but is initially uncertain about the humans payoff function. This paper studies multi-principal assistance games, which cover the more general case in which the robot acts on behalf of N humans who may have widely differing payoffs. Impossibility theorems in social choice theory and voting theory can be applied to such games, suggesting that strategic behavior by the human principals may complicate the robots task in learning their payoffs. We analyze in particular a bandit apprentice game in which the humans act first to demonstrate their individual preferences for the arms and then the robot acts to maximize the sum of human payoffs. We explore the extent to which the cost of choosing suboptimal arms reduces the incentive to mislead, a form of natural mechanism design. In this context we propose a social choice method that uses shared control of a system to combine preference inference with social welfare optimization.
Artificial Intelligence for Business
Free Coupon Discount - Artificial Intelligence for Business, Solve Real World Business Problems with AI Solutions Created by Hadelin de Ponteves Kirill Eremenko SuperDataScience Team Students also bought Unsupervised Deep Learning in Python Cluster Analysis and Unsupervised Machine Learning in Python Advanced AI: Deep Reinforcement Learning in Python Cutting-Edge AI: Deep Reinforcement Learning in Python Deep Learning: Recurrent Neural Networks in Python Deep Learning Prerequisites: Linear Regression in Python Preview this Udemy Course GET COUPON CODE Description Structure of the course: Part 1 - Optimizing Business Processes Case Study: Optimizing the Flows in an E-Commerce Warehouse AI Solution: Q-Learning Part 2 - Minimizing Costs Case Study: Minimizing the Costs in Energy Consumption of a Data Center AI Solution: Deep Q-Learning Part 3 - Maximizing Revenues Case Study: Maximizing Revenue of an Online Retail Business AI Solution: Thompson Sampling Real World Business Applications: With Artificial Intelligence, you can do three main things for any business: Optimize Business Processes Minimize Costs Maximize Revenues We will show you exactly how to succeed these applications, through Real World Business case studies. And for each of these applications we will build a separate AI to solve the challenge. In Part 1 - Optimizing Processes, we will build an AI that will optimize the flows in an E-Commerce warehouse. In Part 2 - Minimizing Costs, we will build a more advanced AI that will minimize the costs in energy consumption of a data center by more than 50%! Just as Google did last year thanks to DeepMind.
WordCraft: An Environment for Benchmarking Commonsense Agents
Jiang, Minqi, Luketina, Jelena, Nardelli, Nantas, Minervini, Pasquale, Torr, Philip H. S., Whiteson, Shimon, Rocktäschel, Tim
The ability to quickly solve a wide range of real-world tasks requires a commonsense understanding of the world. Yet, how to best extract such knowledge from natural language corpora and integrate it with reinforcement learning (RL) agents remains an open challenge. This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment. To better enable research on agents making use of commonsense knowledge, we propose WordCraft, an RL environment based on Little Alchemy 2. This lightweight environment is fast to run and built upon entities and relations inspired by real-world semantics. We evaluate several representation learning methods on this new benchmark and propose a new method for integrating knowledge graphs with an RL agent.