Energy
Learn to Effectively Explore in Context-Based Meta-RL
Zhang, Jin, Wang, Jianhao, Hu, Hao, Chen, Yingfeng, Fan, Changjie, Zhang, Chongjie
Meta reinforcement learning (meta-RL) provides a principled approach for fast adaptation to novel tasks by extracting prior knowledge from previous tasks. Under such settings, it is crucial for the agent to perform efficient exploration during adaptation to collect useful experiences. However, existing methods suffer from poor adaptation performance caused by inefficient exploration mechanisms, especially in sparse-reward problems. In this paper, we present a novel off-policy context-based meta-RL approach that efficiently learns a separate exploration policy to support fast adaptation, as well as a context-aware exploitation policy to maximize extrinsic return. The explorer is motivated by an information-theoretical intrinsic reward that encourages the agent to collect experiences that provide rich information about the task. Experiment results on both MuJoCo and Meta-World benchmarks show that our method significantly outperforms baselines by performing efficient exploration strategies.
QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning
Cideron, Geoffrey, Pierrot, Thomas, Perrin, Nicolas, Beguir, Karim, Sigaud, Olivier
We propose a novel reinforcement learning algorithm,QD-RL, that incorporates the strengths of off-policy RL algorithms into Quality Diversity (QD) approaches. Quality-Diversity methods contribute structural biases by decoupling the search for diversity from the search for high return, resulting in efficient management of the exploration-exploitation trade-off. However, these approaches generally suffer from sample inefficiency as they call upon evolutionary techniques. QD-RL removes this limitation by relying on off-policy RL algorithms. More precisely, we train a population of off-policy deep RL agents to simultaneously maximize diversity inside the population and the return of the agents. QD-RL selects agents from the diversity-return Pareto Front, resulting in stable and efficient population updates. Our experiments on the Ant-Maze environment show that QD-RL can solve challenging exploration and control problems with deceptive rewards while being more than 15 times more sample efficient than its evolutionary counterparts.
Deep covariate-learning: optimising information extraction from terrain texture for geostatistical modelling applications
Where data is available, it is desirable in geostatistical modelling to make use of additional covariates, for example terrain data, in order to improve prediction accuracy in the modelling task. While elevation itself may be important, additional explanatory power for any given problem can be sought (but not necessarily found) by filtering digital elevation models to extract higher-order derivatives such as slope angles, curvatures, and roughness. In essence, it would be beneficial to extract as much task-relevant information as possible from the elevation grid. However, given the complexities of the natural world, chance dictates that the use of 'off-the-shelf' filters is unlikely to derive covariates that provide strong explanatory power to the target variable at hand, and any attempt to manually design informative covariates is likely to be a trial-and-error process -- not optimal. In this paper we present a solution to this problem in the form of a deep learning approach to automatically deriving optimal task-specific terrain texture covariates from a standard SRTM 90m gridded digital elevation model (DEM). For our target variables we use point-sampled geochemical data from the British Geological Survey: concentrations of potassium, calcium and arsenic in stream sediments. We find that our deep learning approach produces covariates for geostatistical modelling that have surprisingly strong explanatory power on their own, with R-squared values around 0.6 for all three elements (with arsenic on the log scale). These results are achieved without the neural network being provided with easting, northing, or absolute elevation as inputs, and purely reflect the capacity of our deep neural network to extract task-specific information from terrain texture. We hope that these results will inspire further investigation into the capabilities of deep learning within geostatistical applications.
Why Normalizing Flows Fail to Detect Out-of-Distribution Data
Kirichenko, Polina, Izmailov, Pavel, Wilson, Andrew Gordon
Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image dataset. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection. Our investigation reveals that properties that enable flows to generate high-fidelity images can have a detrimental effect on OOD detection.
Towards Understanding the Effect of Leak in Spiking Neural Networks
Chowdhury, Sayeed Shafayet, Lee, Chankyu, Roy, Kaushik
Over the past few years, the advancements of deep artificial neural networks (ANNs) have led to remarkable success in various cognitive tasks (e.g., vision, language and behavior). In some cases, neural networks have outperformed the conventional algorithms and achieved human-level performance [1, 2]. However, recent ANNs are becoming extremely compute-intensive and often do not generalize well to previously unseen data during training. On the other hand, human brain can reliably learn and compute intricate cognitive tasks with only a few watts of power budget. Recently, Spiking Neural Networks (SNNs) have been explored toward realizing robust and energy-efficient machine intelligence guided by the cues from neuroscience experiments [3]. SNNs are categorized as the new generation neural networks [4] based on their neuronal functionalities. A variety of spiking neuron models largely resemble biological neuronal mechanisms, which transmit information through discrete spatiotemporal events (or spikes). These spiking neuron models can be characterized by their internal state called the membrane potential. A spiking neuron integrates the inputs over time and fires a spike-output whenever the membrane potential exceeds a threshold.
Unsupervised Deep Learning of Incompressible Fluid Dynamics
Wandel, Nils, Weinmann, Michael, Klein, Reinhard
Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer aided aerodynamic design of automobiles or airplanes to simulations of physical effects in CGI to research in meteorology. Recent differentiable fluid simulations allow gradient based methods to optimize e.g. fluid control systems in an informed manner. Solving the partial differential equations governed by the dynamics of the underlying physical systems, however, is a challenging task and current numerical approximation schemes still come at high computational costs. In this work, we propose an unsupervised framework that allows powerful deep neural networks to learn the dynamics of incompressible fluids end to end on a grid-based representation. For this purpose, we introduce a loss function that penalizes residuals of the incompressible Navier Stokes equations. After training, the framework yields models that are capable of fast and differentiable fluid simulations and can handle various fluid phenomena such as the Magnus effect and K\'arm\'an vortex streets. Besides demonstrating its real-time capability on a GPU, we exploit our approach in a control optimization scenario.
Counterexample-Guided Learning of Monotonic Neural Networks
Sivaraman, Aishwarya, Farnadi, Golnoosh, Millstein, Todd, Broeck, Guy Van den
The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.
FedGAN: Federated Generative Adversarial Networks for Distributed Data
Rasouli, Mohammad, Sun, Tao, Rajagopal, Ram
We propose Federated Generative Adversarial Network (FedGAN) for training a GAN across distributed sources of non-independent-and-identically-distributed data sources subject to communication and privacy constraints. Our algorithm uses local generators and discriminators which are periodically synced via an intermediary that averages and broadcasts the generator and discriminator parameters. We theoretically prove the convergence of FedGAN with both equal and two time-scale updates of generator and discriminator, under standard assumptions, using stochastic approximations and communication efficient stochastic gradient descents. We experiment FedGAN on toy examples (2D system, mixed Gaussian, and Swiss role), image datasets (MNIST, CIFAR-10, and CelebA), and time series datasets (household electricity consumption and electric vehicle charging sessions). We show FedGAN converges and has similar performance to general distributed GAN, while reduces communication complexity. We also show its robustness to reduced communications.
Exact and Metaheuristic Approaches for the Production Leveling Problem
Vass, Johannes, Lackner, Marie-Louise, Musliu, Nysret
In this paper we introduce a new problem in the field of production planning which we call the Production Leveling Problem. The task is to assign orders to production periods such that the load in each period and on each production resource is balanced, capacity limits are not exceeded and the orders' priorities are taken into account. Production Leveling is an important intermediate step between long-term planning and the final scheduling of orders within a production period, as it is responsible for selecting good subsets of orders to be scheduled within each period. A formal model of the problem is proposed and NP-hardness is shown by reduction from Bin Backing. As an exact method for solving moderately sized instances we introduce a MIP formulation. For solving large problem instances, metaheuristic local search is investigated. A greedy heuristic and two neighborhood structures for local search are proposed, in order to apply them using Variable Neighborhood Descent and Simulated Annealing. Regarding exact techniques, the main question of research is, up to which size instances are solvable within a fixed amount of time. For the metaheuristic approaches the aim is to show that they produce near-optimal solutions for smaller instances, but also scale well to very large instances. A set of realistic problem instances from an industrial partner is contributed to the literature, as well as random instance generators. The experimental evaluation conveys that the proposed MIP model works well for instances with up to 250 orders. Out of the investigated metaheuristic approaches, Simulated Annealing achieves the best results. It is shown to produce solutions with less than 3% average optimality gap on small instances and to scale well up to thousands of orders and dozens of periods and products. The presented metaheuristic methods are already being used in the industry.