Undirected Networks
Entity Abstraction in Visual Model-Based Reinforcement Learning
Veerapaneni, Rishi, Co-Reyes, John D., Chang, Michael, Janner, Michael, Finn, Chelsea, Wu, Jiajun, Tenenbaum, Joshua B., Levine, Sergey
This paper tests the hypothesis that modeling a scene in terms of entities and their local interactions, as opposed to modeling the scene globally, provides a significant benefit in generalizing to physical tasks in a combinatorial space the learner has not encountered before. We present object-centric perception, prediction, and planning (OP3), which to the best of our knowledge is the first entity-centric dynamic latent variable framework for model-based reinforcement learning that acquires entity representations from raw visual observations without supervision and uses them to predict and plan. OP3 enforces entity-abstraction -- symmetric processing of each entity representation with the same locally-scoped function -- which enables it to scale to model different numbers and configurations of objects from those in training. Our approach to solving the key technical challenge of grounding these entity representations to actual objects in the environment is to frame this variable binding problem as an inference problem, and we developing an interactive inference algorithm that uses temporal continuity and interactive feedback to bind information about object properties to the entity variables. On block-stacking tasks, OP3 generalizes to novel block configurations and more objects than observed during training, outperforming an oracle model that assumes access to object supervision and achieving two to three times better accuracy than a state-of-the-art video prediction model.
On Connections between Constrained Optimization and Reinforcement Learning
Vieillard, Nino, Pietquin, Olivier, Geist, Matthieu
Dynamic Programming (DP) provides standard algorithms to solve Markov Decision Processes. However, these algorithms generally do not optimize a scalar objective function. In this paper, we draw connections between DP and (constrained) convex optimization. Specifically, we show clear links in the algorithmic structure between three DP schemes and optimization algorithms. We link Conservative Policy Iteration to Frank-Wolfe, Mirror-Descent Modified Policy Iteration to Mirror Descent, and Politex (Policy Iteration Using Expert Prediction) to Dual Averaging. These abstract DP schemes are representative of a number of (deep) Reinforcement Learning (RL) algorithms. By highlighting these connections (most of which have been noticed earlier, but in a scattered way), we would like to encourage further studies linking RL and convex optimization, that could lead to the design of new, more efficient, and better understood RL algorithms.
Certified Adversarial Robustness for Deep Reinforcement Learning
Lütjens, Björn, Everett, Michael, How, Jonathan P.
Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was already shown to cause an autonomous vehicle to swerve into oncoming traffic. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certified defense for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose the optimal action under a worst-case deviation in input space due to possible adversaries or noise. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task.
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
Igl, Maximilian, Ciosek, Kamil, Li, Yingzhen, Tschiatschek, Sebastian, Zhang, Cheng, Devlin, Sam, Hofmann, Katja
The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL. In particular, we focus on regularization techniques relying on the injection of noise into the learned function, a family that includes some of the most widely used approaches such as Dropout and Batch Normalization. To adapt them to RL, we propose Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality. Furthermore, we demonstrate that the Information Bottleneck (IB) is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Combining the IB with SNI, we significantly outperform current state of the art results, including on the recently proposed generalization benchmark Coinrun.
Large-Scale Characterization and Segmentation of Internet Path Delays with Infinite HMMs
Mouchet, Maxime, Vaton, Sandrine, Chonavel, Thierry, Aben, Emile, Hertog, Jasper den
Round-Trip Times are one of the most commonly collected performance metrics in computer networks. Measurement platforms such as RIPE Atlas provide researchers and network operators with an unprecedented amount of historical Internet delay measurements. It would be very useful to automate the processing of these measurements (statistical characterization of paths performance, change detection, recognition of recurring patterns, etc.). Humans are pretty good at finding patterns in network measurements but it can be difficult to automate this to enable many time series being processed at the same time. In this article we introduce a new model, the HDP-HMM or infinite hidden Markov model, whose performance in trace segmentation is very close to human cognition. This is obtained at the cost of a greater complexity and the ambition of this article is to make the theory accessible to network monitoring and management researchers. We demonstrate that this model provides very accurate results on a labeled dataset and on RIPE Atlas and CAIDA MANIC data. This method has been implemented in Atlas and we introduce the publicly accessible Web API.
11 Alternatives To Keras For Deep Learning Enthusiasts
Infer.NET is a machine learning framework for running Bayesian inference in graphical models. It provides state-of-the-art message-passing algorithms and statistical routines needed to perform inference for a wide variety of applications. There are various intuitive features in this framework such as rich modelling language, multiple inference algorithms, designed for large scale inference as well as user-extendable. With the help of this framework, various Bayesian models such as Bayes Point Machine classifiers, TrueSkill matchmaking, hidden Markov models, and Bayesian networks can be implemented with ease.
Online Gaussian LDA for Unsupervised Pattern Mining from Utility Usage Data
Mohamad, Saad, Bouchachia, Abdelhamid
Non-intrusive load monitoring (NILM) aims at separating a whole-home energy signal into its appliance components. Such method can be harnessed to provide various services to better manage and control energy consumption (optimal planning and saving). NILM has been traditionally approached from signal processing and electrical engineering perspectives. Recently, machine learning has started to play an important role in NILM. While most work has focused on supervised algorithms, unsupervised approaches can be more interesting and of practical use in real case scenarios. Specifically, they do not require labelled training data to be acquired from individual appliances and the algorithm can be deployed to operate on the measured aggregate data directly. In this paper, we propose a fully unsupervised NILM framework based on Bayesian hierarchical mixture models. In particular, we develop a new method based on Gaussian Latent Dirichlet Allocation (GLDA) in order to extract global components that summarise the energy signal. These components provide a representation of the consumption patterns. Designed to cope with big data, our algorithm, unlike existing NILM ones, does not focus on appliance recognition. To handle this massive data, GLDA works online. Another novelty of this work compared to the existing NILM is that the data involves different utilities (e.g, electricity, water and gas) as well as some sensors measurements. Finally, we propose different evaluation methods to analyse the results which show that our algorithm finds useful patterns.
On the convergence of projective-simulation-based reinforcement learning in Markov decision processes
Clausen, Jens, Boyajian, Walter L., Trenkwalder, Lea M., Dunjko, Vedran, Briegel, Hans J.
In recent years, the interest in leveraging quantum effects for enhancing machine learning tasks has significantly increased. Many algorithms speeding up supervised and unsupervised learning were established. The first framework in which ways to exploit quantum resources specifically for the broader context of reinforcement learning were found is projective simulation. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speed-ups. Although classical variants of projective simulation have been benchmarked against common reinforcement learning algorithms, very few formal theoretical analyses have been provided for its performance in standard learning scenarios. In this paper, we provide a detailed formal discussion of the properties of this model. Specifically, we prove that one version of the projective simulation model, understood as a reinforcement learning approach, converges to optimal behavior in a large class of Markov decision processes. This proof shows that a physically-inspired approach to reinforcement learning can guarantee to converge.
On the geometry of learning neural quantum states
Park, Chae-Yeun, Kastoryano, Michael J.
Combining insights from machine learning and quantum Monte Carlo, the stochastic reconfiguration method with neural network Ansatz states is a promising new direction for high precision ground state estimation of quantum many body problems. At present, the method is heuristic, lacking a proper theoretical foundation. We initiate a thorough analysis of the learning landscape, and show that it reveals universal behavior reflecting a combination of the underlying physics and of the learning dynamics. In particular, the spectrum of the quantum Fisher matrix of complex restricted Boltzmann machine states can dramatically change across a phase transition. In contrast to the spectral properties of the quantum Fisher matrix, the actual weights of the network at convergence do not reveal much information about the system or the dynamics. Furthermore, we identify a new measure of correlation in the state by analyzing entanglement the eigenvectors. We show that, generically, the learning landscape modes with least entanglement have largest eigenvalue, suggesting that correlations are encoded in large flat valleys of the learning landscape, favoring stable representations of the ground state.
Task-Motion Planning for Navigation in Belief Space
Thomas, Antony, Mastrogiovanni, Fulvio, Baglietto, Marco
Task-Motion Planning for Navigation in Belief Space Antony Thomas, Fulvio Mastrogiovanni, and Marco Baglietto Abstract We present an integrated Task-Motion Planning (TMP) framework for navigation in large-scale environment. Autonomous robots operating in real world complex scenarios require planning in the discrete (task) space and the continuous (motion) space. In knowledge intensive domains, on the one hand, a robot has to reason at the highest-level, for example the regions to navigate to; on the other hand, the feasibility of the respective navigation tasks have to be checked at the execution level. This presents a need for motion-planning-aware task planners. We discuss a probabilistically complete approach that leverages this task-motion interaction for navigating in indoor domains, returning a plan that is optimal at the task-level. Furthermore, our framework is intended for motion planning under motion and sensing uncertainty, which is formally known as belief space planning. The underlying methodology is validated with a simulated office environment in Gazebo. In addition, we discuss the limitations and provide suggestions for improvements and future work. 1 Introduction Autonomous robots operating in complex real world scenarios require different levels of planning to execute their tasks. High-level (task) planning helps break down a given set of tasks into a sequence of sub-tasks. Actual execution of each of these sub-tasks would require low-level control actions to generate appropriate robot motions. In fact, the dependency between logical and geometrical aspects is pervasive in both task planning and execution.