Gomes, Carla


Understanding Batch Normalization

arXiv.org Machine Learning

Batch normalization is a ubiquitous deep learning technique that normalizes activations in intermediate layers. It is associated with improved accuracy and faster learning, but despite its enormous success there is little consensus regarding why it works. We aim to rectify this and take an empirical approach to understanding batch normalization. Our primary observation is that the higher learning rates that batch normalization enables have a regularizing effect that dramatically improves generalization of normalized networks, which is both demonstrated empirically and motivated theoretically. We show how activations become large and how the convolutional channels become increasingly ill-behaved for layers deep in unnormalized networks, and how this results in larger input-independent gradients. Beyond just gradient scaling, we demonstrate how the learning rate in unnormalized networks is further limited by the magnitude of activations growing exponentially with network depth for large parameter updates, a problem batch normalization trivially avoids. Motivated by recent results in random matrix theory, we argue that ill-conditioning of the activations is due to fluctuations in random initialization, shedding new light on classical initialization schemes and their consequences.


Imitation Refinement

arXiv.org Artificial Intelligence

Many real-world tasks involve identifying patterns from data satisfying background and prior knowledge, for which the ground truth is not available, but ideal data can be obtained, for example, using theoretical simulations. We propose a novel approach, imitation refinement, which refines imperfect patterns by imitating ideal patterns. The imperfect patterns are obtained for example using an unsupervised learner. Imitation refinement imitates ideal data by incorporating prior knowledge captured by a classifier trained on the ideal data: an imitation refiner applies small modifications to imperfect patterns so that the classifier can identify them. In a sense, imitation refinement fits the data to the classifier, which complements the classical supervised learning task. We show that our imitation refinement approach outperforms existing methods in identifying crystal patterns from X-ray diffraction data in materials discovery. We also show the generality of our approach by illustrating its applicability to a computer vision task.


Scalable Relaxations of Sparse Packing Constraints: Optimal Biocontrol in Predator-Prey Networks

AAAI Conferences

Cascades represent rapid changes in networks. A cascading phenomenon of ecological and economic impact is the spread of invasive species in geographic landscapes. The most promising management strategy is often biocontrol, which entails introducing a natural predator able to control the invading population, a setting that can be treated as two interacting cascades of predator and prey populations. We formulate and study a nonlinear problem of optimal biocontrol: optimally seeding the predator cascade over time to minimize the harmful prey population. Recurring budgets, which typically face conservation organizations, naturally leads to sparse constraints which make the problem amenable to approximation algorithms. Available methods based on continuous relaxations scale poorly, to remedy this we develop a novel and scalable randomized algorithm based on a width relaxation, applicable to a broad class of combinatorial optimization problems. We evaluate our contributions in the context of biocontrol for the insect pest Hemlock Wolly Adelgid (HWA) in eastern North America. Our algorithm outperforms competing methods in terms of scalability and solution quality and finds near-optimal strategies for the control of the HWA for fine-grained networks -- an important problem in computational sustainability.


Efficiently Approximating the Pareto Frontier: Hydropower Dam Placement in the Amazon Basin

AAAI Conferences

Real-world problems are often not fully characterized by a single optimal solution, as they frequently involve multiple competing objectives; it is therefore important to identify the so-called Pareto frontier, which captures solution trade-offs. We propose a fully polynomial-time approximation scheme based on Dynamic Programming (DP) for computing a polynomially succinct curve that approximates the Pareto frontier to within an arbitrarily small epsilon > 0 on tree-structured networks. Given a set of objectives, our approximation scheme runs in time polynomial in the size of the instance and 1/epsilon. We also propose a Mixed Integer Programming (MIP) scheme to approximate the Pareto frontier. The DP and MIP Pareto frontier approaches have complementary strengths and are surprisingly effective. We provide empirical results showing that our methods outperform other approaches in efficiency and accuracy. Our work is motivated by a problem in computational sustainability concerning the proliferation of hydropower dams throughout the Amazon basin. Our goal is to support decision-makers in evaluating impacted ecosystem services on the full scale of the Amazon basin. Our work is general and can be applied to approximate the Pareto frontier of a variety of multiobjective problems on tree-structured networks.


Uncovering Hidden Structure through Parallel Problem Decomposition for the Set Basis Problem

AAAI Conferences

Exploiting parallelism is a key strategy for speeding up computation. However, on hard combinatorial problems, such a strategy has been surprisingly challenging due to the intricate variable interactions. In this paper we introduce a novel way in which parallelism can be used to exploit hidden structure of hard combinatorial problems, orthogonal to divide-and-conquer and portfolio approaches. We demonstrate the success of this approach on the minimal set basis problem, which has a wide range of applications e.g., in optimization, machine learning, and system security. We also show the effectiveness of our approach on a related application problem from materials discovery. In our approach, a large number of smaller sub-problems are identified and solved concurrently. We then aggregate the information from those solutions, and use this information to initialize the search of a global, complete solver. We show that this strategy leads to a significant speed-up over a sequential approach since the aggregated sub-problem solution information often provides key structural insights to the complete solver. Our approach also greatly outperforms state-of-the-art incomplete solvers in terms of solution quality. Our work opens up a novel angle for using parallelism to solve hard combinatorial problems.


Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery

arXiv.org Machine Learning

Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition, we propose a new pattern decomposition algorithm, called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains.


Computational Sustainability: Editorial Introduction to the Summer and Fall Issues

AI Magazine

Computational sustainability problems, which exist in dynamic environments with high amounts of uncertainty, provide a variety of unique challenges to artificial intelligence research and the opportunity for significant impact upon our collective future. This editorial introduction provides an overview of artificial intelligence for computational sustainability, and introduces the special issue articles that appear in this issue and the previous issue of AI Magazine.


Computational Sustainability: Editorial Introduction to the Summer and Fall Issues

AI Magazine

Computational sustainability problems, which exist in dynamic environments with high amounts of uncertainty, provide a variety of unique challenges to artificial intelligence research and the opportunity for significant impact upon our collective future. This editorial introduction provides an overview of artificial intelligence for computational sustainability, and introduces the special issue articles that appear in this issue and the previous issue of AI Magazine.


Designing Fast Absorbing Markov Chains

AAAI Conferences

Markov Chains are a fundamental tool for the analysis of real world phenomena and randomized algorithms. Given a graph with some specified sink nodes and an initial probability distribution,we consider the problem of designing an absorbing Markov Chain that minimizes the time required to reach a sink node, by selecting transition probabilities subject to some natural regularity constraints. By exploiting the Markovian structure, we obtain closed form expressions for the objective function as well as its gradient, which can be thus evaluated efficiently without any simulation of the underlying process and fed to a gradient-based optimization package. For the special case of designing reversible Markov Chains, we show that global optimum can be efficiently computed by exploiting convexity. We demonstrate how our method can be used for the evaluation and design of local search methods tailored for certain domains.


Large Landscape Conservation — Synthetic and Real-World Datasets

AAAI Conferences

Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. However, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited economic resources, and therefore designing scalable, cost-efficient and systematic approaches for conservation planning is an important and challenging computational task. In particular, preserving landscape connectivity between good habitat has become a key conservation priority in recent years. We give an overview of landscape connectivity conservation and some of the underlying graph-theoretic optimization problems. We present a synthetic generator capable of creating families of randomized structured problems, capturing the essential features of real-world instances but allowing for a thorough typical-case performance evaluation of different solution methods. We also present two large-scale real-world datasets, including economic data on land cost, and species data for grizzly bears, wolverines and lynx.