Country
C-CoCoA: A Continuous Cooperative Constraint Approximation Algorithm to Solve Functional DCOPs
Sarker, Amit, Arif, Abdullahil Baki, Choudhury, Moumita, Khan, Md. Mosaddek
Distributed Constraint Optimization Problems (DCOPs) have been widely used to coordinate interactions (i.e. constraints) in cooperative multi-agent systems. The traditional DCOP model assumes that variables owned by the agents can take only discrete values and constraints' cost functions are defined for every possible value assignment of a set of variables. While this formulation is often reasonable, there are many applications where the variables are continuous decision variables and constraints are in functional form. To overcome this limitation, Functional DCOP (F-DCOP) model is proposed that is able to model problems with continuous variables. The existing F-DCOPs algorithms experience huge computation and communication overhead. This paper applies continuous non-linear optimization methods on Cooperative Constraint Approximation (CoCoA) algorithm. We empirically show that our algorithm is able to provide high-quality solutions at the expense of smaller communication cost and execution time compared to the existing F-DCOP algorithms.
ConQUR: Mitigating Delusional Bias in Deep Q-learning
Su, Andy, Ooi, Jayden, Lu, Tyler, Schuurmans, Dale, Boutilier, Craig
Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.
Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
Jurgenson, Tom, Avner, Or, Groshev, Edward, Tamar, Aviv
Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states. Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal. Instead, we propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem, which naturally solves multi-goal queries. We show that this approach has computational benefits for both standard and approximate dynamic programming. Interestingly, our formulation prescribes a novel protocol for computing a trajectory: instead of predicting the next state given its predecessor, as in standard RL, a goal-conditioned trajectory is constructed by first predicting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this trajectory structure a sub-goal tree. Building on it, we additionally extend the policy gradient methodology to recursively predict sub-goals, resulting in novel goal-based algorithms. Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-DoF robot arm between obstacles.
Hallucinative Topological Memory for Zero-Shot Visual Planning
Liu, Kara, Kurutach, Thanard, Tung, Christine, Abbeel, Pieter, Tamar, Aviv
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e.g., images obtained from self-supervised robot interaction. Most previous works on VP approached the problem by planning in a learned latent space, resulting in low-quality visual plans, and difficult training algorithms. Here, instead, we propose a simple VP method that plans directly in image space and displays competitive performance. We build on the semi-parametric topological memory (SPTM) method: image samples are treated as nodes in a graph, the graph connectivity is learned from image sequence data, and planning can be performed using conventional graph search methods. We propose two modifications on SPTM. First, we train an energy-based graph connectivity function using contrastive predictive coding that admits stable training. Second, to allow zero-shot planning in new domains, we learn a conditional VAE model that generates images given a context of the domain, and use these hallucinated samples for building the connectivity graph and planning. We show that this simple approach significantly outperform the state-of-the-art VP methods, in terms of both plan interpretability and success rate when using the plan to guide a trajectory-following controller. Interestingly, our method can pick up non-trivial visual properties of objects, such as their geometry, and account for it in the plans.
Testing Monotonicity of Machine Learning Models
Sharma, Arnab, Wehrheim, Heike
Today, machine learning (ML) models are increasingly applied in decision making. This induces an urgent need for quality assurance of ML models with respect to (often domain-dependent) requirements. Monotonicity is one such requirement. It specifies a software as 'learned' by an ML algorithm to give an increasing prediction with the increase of some attribute values. While there exist multiple ML algorithms for ensuring monotonicity of the generated model, approaches for checking monotonicity, in particular of black-box models, are largely lacking. In this work, we propose verification-based testing of monotonicity, i.e., the formal computation of test inputs on a white-box model via verification technology, and the automatic inference of this approximating white-box model from the black-box model under test. On the white-box model, the space of test inputs can be systematically explored by a directed computation of test cases. The empirical evaluation on 90 black-box models shows verification-based testing can outperform adaptive random testing as well as property-based techniques with respect to effectiveness and efficiency.
Learning Optimal Temperature Region for Solving Mixed Integer Functional DCOPs
Mahmud, Saaduddin, Khan, Md. Mosaddek, Choudhury, Moumita, Tran-Thanh, Long, Jennings, Nicholas R.
Distributed Constraint Optimization Problems (DCOPs) are an important framework that models coordinated decision-making problem in multi-agent systems with a set of discrete variables. Later work has extended this to model problems with a set of continuous variables (F-DCOPs). In this paper, we combine both of these models into the Mixed Integer Functional DCOP (MIF-DCOP) model that can deal with problems regardless of its variables' type. We then propose a novel algorithm, called Distributed Parallel Simulated Annealing (DPSA), where agents cooperatively learn the optimal parameter configuration for the algorithm while also solving the given problem using the learned knowledge. Finally, we empirically benchmark our approach in DCOP, F-DCOP and MIF-DCOP settings and show that DPSA produces solutions of significantly better quality than the state-of-the-art non-exact algorithms in their corresponding setting.
Autonomous robotic nanofabrication with reinforcement learning
Leinen, Philipp, Esders, Malte, Schütt, Kristof T., Wagner, Christian, Müller, Klaus-Robert, Tautz, F. Stefan
The ability to handle single molecules as effectively as macroscopic building-blocks would enable the construction of complex supramolecular structures that are not accessible by self-assembly. The fundamental challenges on the way towards this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles, and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach employs reinforcement learning (RL), which is able to learn solution strategies even in the face of large uncertainty and with sparse feedback. However, to be useful for autonomous nanofabrication, standard RL algorithms need to be adapted to cope with the limited training opportunities available. We demonstrate the potential of our RL approach by applying it to an exemplary task of subtractive manufacturing, the removal of individual molecules from a molecular layer using a scanning probe microscope (SPM). Our RL agent reaches an excellent performance level, enabling us to automate a task which previously had to be performed by a human. We anticipate that our work opens the way towards autonomous agents for the robotic construction of functional supramolecular structures with speed, precision and perseverance beyond our current capabilities.
Learning to Walk in the Real World with Minimal Human Effort
Ha, Sehoon, Xu, Peng, Tan, Zhenyu, Levine, Sergey, Tan, Jie
Reliable and stable locomotion has been one of the most fundamental challenges for legged robots. Deep reinforcement learning (deep RL) has emerged as a promising method for developing such control policies autonomously. In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort. The key difficulties for on-robot learning systems are automatic data collection and safety. We overcome these two challenges by developing a multi-task learning procedure, an automatic reset controller, and a safety-constrained RL framework. We tested our system on the task of learning to walk on three different terrains: flat ground, a soft mattress, and a doormat with crevices. Our system can automatically and efficiently learn locomotion skills on a Minitaur robot with little human intervention.
Learning new skills online boosts annual pay by an average of £3,640
Online learning is boosting the average UK pay packet by £3,640, or £2 per hour, according to a new study. Britons are turning to the internet to learn new skills which is in turn boosting the UK's economic output, according to the poll of 20,000 of the country's citizens. The Google-backed survey indicates that many working people have used internet-based learning to increase their pay or to get a new job. Work-related online learning that makes use of search engines, video and social media could not only benefit staff but employers' productivity and economic heft. The report concludes that the more you learn the more you earn as people's hunger for knowledge and self-improvement is directly impacting salaries and professional progression'What we found is really encouraging – not just for the businesses and organisations that are benefiting from upskilling employees, but in terms of the economy as a whole,' said Polly Mackenzie, chief executive of Demos, which carried out the survey.
Facial recognition software company reveals it was security breach exposing its entire client list
Facial recognition software provider Clearview AI has revealed that its entire client list was stolen by someone who'gained unauthorized access' to company documents and data. According to a notice sent to its customers, Cleaview AI said that in addition to its client list, the intruder had gained access to the number of user accounts associated with each client, as well as the number of searches conducted through those accounts. The company didn't specify how the security breach had occurred nor who might have been responsible, and it claimed its servers and internal network hadn't been compromised. Facial recognition software company Clearview AI has revealed a security breach that exposed it's client list and number of searches those clients made'Unfortunately, data breaches are part of life in the 21st century,' Clearview attorney Tor Ekeland told The Daily Beast, who broke the story. 'Our servers were never accessed.