Optimization is a ubiquitous modeling tool and is often deployed in settings which repeatedly solve similar instances of the same problem. Amortized optimization methods use learning to predict the solutions to problems in these settings. This leverages the shared structure between similar problem instances. In this tutorial, we will discuss the key design choices behind amortized optimization, roughly categorizing 1) models into fully-amortized and semi-amortized approaches, and 2) learning methods into regression-based and objectivebased. We then view existing applications through these foundations to draw connections between them, including for manifold optimization, variational inference, sparse coding, meta-learning, control, reinforcement learning, convex optimization, and deep equilibrium networks. This framing enables us easily see, for example, that the amortized inference in variational autoencoders is conceptually identical to value gradients in control and reinforcement learning as they both use fully-amortized models with an objective-based loss.
Lavin, Alexander, Zenil, Hector, Paige, Brooks, Krakauer, David, Gottschlich, Justin, Mattson, Tim, Anandkumar, Anima, Choudry, Sanjay, Rocki, Kamil, Baydin, Atılım Güneş, Prunkl, Carina, Paige, Brooks, Isayev, Olexandr, Peterson, Erik, McMahon, Peter L., Macke, Jakob, Cranmer, Kyle, Zhang, Jiaxin, Wainwright, Haruko, Hanuka, Adi, Veloso, Manuela, Assefa, Samuel, Zheng, Stephan, Pfeffer, Avi
The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
The world is structured in countless ways. It may be prudent to enforce corresponding structural properties to a learning algorithm's solution, such as incorporating prior beliefs, natural constraints, or causal structures. Doing so may translate to faster, more accurate, and more flexible models, which may directly relate to real-world impact. In this dissertation, we consider two different research areas that concern structuring a learning algorithm's solution: when the structure is known and when it has to be discovered.
University campuses are essentially a microcosm of a city. They comprise diverse facilities such as residences, sport centres, lecture theatres, parking spaces, and public transport stops. Universities are under constant pressure to improve efficiencies while offering a better experience to various stakeholders including students, staff, and visitors. Nonetheless, anecdotal evidence indicates that campus assets are not being utilised efficiently, often due to the lack of data collection and analysis, thereby limiting the ability to make informed decisions on the allocation and management of resources. Advances in the Internet of Things (IoT) technologies that can sense and communicate data from the physical world, coupled with data analytics and Artificial intelligence (AI) that can predict usage patterns, have opened up new opportunities for organisations to lower cost and improve user experience. This thesis explores this opportunity via theory and experimentation using UNSW Sydney as a living laboratory.
In this thesis, we explore and apply methods inspired by the free energy principle to two important areas in machine learning and neuroscience. The free energy principle is a general mathematical theory of the necessary information-theoretic behaviours of systems which maintain a separation from their environment. A core postulate of the theory is that complex systems can be seen as performing variational Bayesian inference and minimizing an information-theoretic quantity called the variational free energy. The free energy principle originated in, and has been extremely influential in theoretical neuroscience, having spawned a number of neurophysiologically realistic process theories, and maintaining close links with Bayesian Brain viewpoints. The thesis is split into three main parts where we apply methods and insights from the free energy principle to understand questions first in perception, then action, and finally learning. Specifically, in the first section, we focus on the theory of predictive coding, a neurobiologically plausible process theory derived from the free energy principle under certain assumptions, which argues that the primary function of the brain is to minimize prediction errors. We focus on scaling up predictive coding architectures and simulate large-scale predictive coding networks for perception on machine learning benchmarks; we investigate predictive coding's relationship to other classical filtering algorithms, and we demonstrate that many biologically implausible aspects of current models of predictive coding can be relaxed without unduly harming the performance of predictive coding models which allows for a potentially more literal translation of predictive coding theory into cortical microcircuits. In the second part of the thesis, we focus on the application of methods deriving from the free energy principle to action. We study the extension of methods of'active inference', a neurobiologically grounded account of action through variational message passing, to utilize deep artificial neural networks, allowing these methods to'scale up' to be competitive with state of the art deep reinforcement learning methods.
Buluc, Aydin, Kolda, Tamara G., Wild, Stefan M., Anitescu, Mihai, DeGennaro, Anthony, Jakeman, John, Kamath, Chandrika, Ramakrishnan, null, Kannan, null, Lopes, Miles E., Martinsson, Per-Gunnar, Myers, Kary, Nelson, Jelani, Restrepo, Juan M., Seshadhri, C., Vrabie, Draguna, Wohlberg, Brendt, Wright, Stephen J., Yang, Chao, Zwart, Peter
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences. Throughout, the text discusses historical context and societal impact. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices.
Over the last decade, the long-running endeavour to automate high-level processes in machine learning (ML) has risen to mainstream prominence, stimulated by advances in optimisation techniques and their impact on selecting ML models/algorithms. Central to this drive is the appeal of engineering a computational system that both discovers and deploys high-performance solutions to arbitrary ML problems with minimal human interaction. Beyond this, an even loftier goal is the pursuit of autonomy, which describes the capability of the system to independently adjust an ML solution over a lifetime of changing contexts. However, these ambitions are unlikely to be achieved in a robust manner without the broader synthesis of various mechanisms and theoretical frameworks, which, at the present time, remain scattered across numerous research threads. Accordingly, this review seeks to motivate a more expansive perspective on what constitutes an automated/autonomous ML system, alongside consideration of how best to consolidate those elements. In doing so, we survey developments in the following research areas: hyperparameter optimisation, multi-component models, neural architecture search, automated feature engineering, meta-learning, multi-level ensembling, dynamic adaptation, multi-objective evaluation, resource constraints, flexible user involvement, and the principles of generalisation. We also develop a conceptual framework throughout the review, augmented by each topic, to illustrate one possible way of fusing high-level mechanisms into an autonomous ML system. Ultimately, we conclude that the notion of architectural integration deserves more discussion, without which the field of automated ML risks stifling both its technical advantages and general uptake.
Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.