In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities, often performing worse for members of traditionally underserved groups. One factor contributing to these performance gaps is a lack of representation in the data the models are trained on. It is often unclear, however, how to operationalize representativeness in specific applications. Here we formalize the problem of creating equitable training datasets, and propose a statistical framework for addressing this problem. We consider a setting where a model builder must decide how to allocate a fixed data collection budget to gather training data from different subgroups. We then frame dataset creation as a constrained optimization problem, in which one maximizes a function of group-specific performance metrics based on (estimated) group-specific learning rates and costs per sample. This flexible approach incorporates preferences of model-builders and other stakeholders, as well as the statistical properties of the learning task. When data collection decisions are made sequentially, we show that under certain conditions this optimization problem can be efficiently solved even without prior knowledge of the learning rates. To illustrate our approach, we conduct a simulation study of polygenic risk scores on synthetic genomic data -- an application domain that often suffers from non-representative data collection. We find that our adaptive sampling strategy outperforms several common data collection heuristics, including equal and proportional sampling, demonstrating the value of strategic dataset design for building equitable models.
Petropoulos, Fotios, Apiletti, Daniele, Assimakopoulos, Vassilios, Babai, Mohamed Zied, Barrow, Devon K., Taieb, Souhaib Ben, Bergmeir, Christoph, Bessa, Ricardo J., Bijak, Jakub, Boylan, John E., Browell, Jethro, Carnevale, Claudio, Castle, Jennifer L., Cirillo, Pasquale, Clements, Michael P., Cordeiro, Clara, Oliveira, Fernando Luiz Cyrino, De Baets, Shari, Dokumentov, Alexander, Ellison, Joanne, Fiszeder, Piotr, Franses, Philip Hans, Frazier, David T., Gilliland, Michael, Gönül, M. Sinan, Goodwin, Paul, Grossi, Luigi, Grushka-Cockayne, Yael, Guidolin, Mariangela, Guidolin, Massimo, Gunter, Ulrich, Guo, Xiaojia, Guseo, Renato, Harvey, Nigel, Hendry, David F., Hollyman, Ross, Januschowski, Tim, Jeon, Jooyoung, Jose, Victor Richmond R., Kang, Yanfei, Koehler, Anne B., Kolassa, Stephan, Kourentzes, Nikolaos, Leva, Sonia, Li, Feng, Litsiou, Konstantia, Makridakis, Spyros, Martin, Gael M., Martinez, Andrew B., Meeran, Sheik, Modis, Theodore, Nikolopoulos, Konstantinos, Önkal, Dilek, Paccagnini, Alessia, Panagiotelis, Anastasios, Panapakidis, Ioannis, Pavía, Jose M., Pedio, Manuela, Pedregal, Diego J., Pinson, Pierre, Ramos, Patrícia, Rapach, David E., Reade, J. James, Rostami-Tabar, Bahman, Rubaszek, Michał, Sermpinis, Georgios, Shang, Han Lin, Spiliotis, Evangelos, Syntetos, Aris A., Talagala, Priyanga Dilini, Talagala, Thiyanga S., Tashman, Len, Thomakos, Dimitrios, Thorarinsdottir, Thordis, Todini, Ezio, Arenas, Juan Ramón Trapero, Wang, Xiaoqian, Winkler, Robert L., Yusupova, Alisa, Ziel, Florian
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.
The ability to recover from an unexpected external perturbation is a fundamental motor skill in bipedal locomotion. An effective response includes the ability to not just recover balance and maintain stability but also to fall in a safe manner when balance recovery is physically infeasible. For robots associated with bipedal locomotion, such as humanoid robots and assistive robotic devices that aid humans in walking, designing controllers which can provide this stability and safety can prevent damage to robots or prevent injury related medical costs. This is a challenging task because it involves generating highly dynamic motion for a high-dimensional, non-linear and under-actuated system with contacts. Despite prior advancements in using model-based and optimization methods, challenges such as requirement of extensive domain knowledge, relatively large computational time and limited robustness to changes in dynamics still make this an open problem. In this thesis, to address these issues we develop learning-based algorithms capable of synthesizing push recovery control policies for two different kinds of robots : Humanoid robots and assistive robotic devices that assist in bipedal locomotion. Our work can be branched into two closely related directions : 1) Learning safe falling and fall prevention strategies for humanoid robots and 2) Learning fall prevention strategies for humans using a robotic assistive devices. To achieve this, we introduce a set of Deep Reinforcement Learning (DRL) algorithms to learn control policies that improve safety while using these robots.
Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a drift in the learning process, in both local (client) and global (server) optimizations, which harms the convergence as well as model performance. Many previous works have attempted to address the non-iid issue by tackling the drift locally or globally, but how to jointly solve the two essentially coupled drifts is still unclear. In this work, we concentrate on handling both local and global drifts and introduce a new harmonizing framework called HarmoFL. First, we propose to mitigate the local update drift by normalizing amplitudes of images transformed into the frequency domain to mimic a unified imaging setting, in order to generate a harmonized feature space across local clients. Second, based on harmonized features, we design a client weight perturbation guiding each local model to reach a flat optimum, where a neighborhood area of the local optimal solution has a uniformly low loss. Without any extra communication cost, the perturbation assists the global model to optimize towards a converged optimal solution by aggregating several local flat optima. We have theoretically analyzed the proposed method and empirically conducted extensive experiments on three medical image classification and segmentation tasks, showing that HarmoFL outperforms a set of recent state-of-the-art methods with promising convergence behavior. Code is available at https://github.com/med-air/HarmoFL.
Machine learning-based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery. Here we propose QMO, a generic query-based molecule optimization framework that exploits latent embeddings from a molecule autoencoder. QMO improves the desired properties of an input molecule based on efficient queries, guided by a set of molecular property predictions and evaluation metrics. We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules for drug-likeness and solubility under similarity constraints. We also demonstrate substantial property improvement using QMO on two new and challenging tasks that are also important in real-world discovery problems: (1) optimizing existing potential SARS-CoV-2 main protease inhibitors towards higher binding affinity and (2) improving known antimicrobial peptides towards lower toxicity. Results from QMO show high consistency with external validations, suggesting an effective means to facilitate material optimization problems with design constraints. Zeroth-order optimization is used on problems where no explicit gradient function is accessible, but single points can be queried. Hoffman et al. present here a molecular design method that uses zeroth-order optimization to deal with the discreteness of molecule sequences and to incorporate external guidance from property evaluations and design constraints.
We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate. Such problems arise, for example, in reinforcement learning, engineering design, and manufacturing. While the standard Bayesian optimization approach observes only the final output, our approach delivers greater query efficiency by leveraging information that the former ignores: intermediate output within the network. This is achieved by modeling the nodes of the network using Gaussian processes and choosing the points to evaluate using, as our acquisition function, the expected improvement computed with respect to the implied posterior on the objective. Although the non-Gaussian nature of this posterior prevents computing our acquisition function in closed form, we show that it can be efficiently maximized via sample average approximation. In addition, we prove that our method is asymptotically consistent, meaning that it finds a globally optimal solution as the number of evaluations grows to infinity, thus generalizing previously known convergence results for the expected improvement. Notably, this holds even though our method might not evaluate the domain densely, instead leveraging problem structure to leave regions unexplored. Finally, we show that our approach dramatically outperforms standard Bayesian optimization methods in several synthetic and real-world problems.
If you are in the fields of data science or machine learning, chances are you already are doing optimization! For example, training a neural network is an optimization problem, as we want to find the set of model weights that best minimizes the loss function. Finding the set of hyper parameters that results in the best performing model is another optimization problem. Optimization algorithms come in many forms, each created to solve a particular type of problem. In particular, one type of problem commonly faced by scientists in both academia and industry is the optimization of expensive-to-evaluate black box functions.
Hyperparameter optimization (HPO) is a necessary step to ensure the best possible performance of Machine Learning (ML) algorithms. Several methods have been developed to perform HPO; most of these are focused on optimizing one performance measure (usually an error-based measure), and the literature on such single-objective HPO problems is vast. Recently, though, algorithms have appeared which focus on optimizing multiple conflicting objectives simultaneously. This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms, distinguishing between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both. We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
Lavin, Alexander, Zenil, Hector, Paige, Brooks, Krakauer, David, Gottschlich, Justin, Mattson, Tim, Anandkumar, Anima, Choudry, Sanjay, Rocki, Kamil, Baydin, Atılım Güneş, Prunkl, Carina, Paige, Brooks, Isayev, Olexandr, Peterson, Erik, McMahon, Peter L., Macke, Jakob, Cranmer, Kyle, Zhang, Jiaxin, Wainwright, Haruko, Hanuka, Adi, Veloso, Manuela, Assefa, Samuel, Zheng, Stephan, Pfeffer, Avi
The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
The existence of a universal learning architecture in human cognition is a widely spread conjecture supported by experimental findings from neuroscience. While no low-level implementation can be specified yet, an abstract outline of human perception and learning is believed to entail three basic properties: (a) hierarchical attention and processing, (b) memory-based knowledge representation, and (c) progressive learning and knowledge compaction. We approach the design of such a learning architecture from a system-theoretic viewpoint, developing a closed-loop system with three main components: (i) a multi-resolution analysis pre-processor, (ii) a group-invariant feature extractor, and (iii) a progressive knowledge-based learning module. Multi-resolution feedback loops are used for learning, i.e., for adapting the system parameters to online observations. To design (i) and (ii), we build upon the established theory of wavelet-based multi-resolution analysis and the properties of group convolution operators. Regarding (iii), we introduce a novel learning algorithm that constructs progressively growing knowledge representations in multiple resolutions. The proposed algorithm is an extension of the Online Deterministic Annealing (ODA) algorithm based on annealing optimization, solved using gradient-free stochastic approximation. ODA has inherent robustness and regularization properties and provides a means to progressively increase the complexity of the learning model i.e. the number of the neurons, as needed, through an intuitive bifurcation phenomenon. The proposed multi-resolution approach is hierarchical, progressive, knowledge-based, and interpretable. We illustrate the properties of the proposed architecture in the context of the state-of-the-art learning algorithms and deep learning methods.