Optimization
Optimal Sepsis Patient Treatment using Human-in-the-loop Artificial Intelligence
Gupta, Akash, Lash, Michael T., Nachimuthu, Senthil K.
Sepsis is one of the leading causes of death in Intensive Care Units (ICU). The strategy for treating sepsis involves the infusion of intravenous (IV) fluids and administration of antibiotics. Determining the optimal quantity of IV fluids is a challenging problem due to the complexity of a patient's physiology. In this study, we develop a data-driven optimization solution that derives the optimal quantity of IV fluids for individual patients. The proposed method minimizes the probability of severe outcomes by controlling the prescribed quantity of IV fluids and utilizes human-in-the-loop artificial intelligence. We demonstrate the performance of our model on 1122 ICU patients with sepsis diagnosis extracted from the MIMIC-III dataset. The results show that, on average, our model can reduce mortality by 22%. This study has the potential to help physicians synthesize optimal, patient-specific treatment strategies.
Data-Driven Topology Optimization with Multiclass Microstructures using Latent Variable Gaussian Process
Wang, Liwei, Tao, Siyu, Zhu, Ping, Chen, Wei
The data-driven approach is emerging as a promising method for the topological design of multiscale structures with greater efficiency. However, existing data-driven methods mostly focus on a single class of microstructures without considering multiple classes to accommodate spatially varying desired properties. The key challenge is the lack of an inherent ordering or distance measure between different classes of microstructures in meeting a range of properties. To overcome this hurdle, we extend the newly developed latent-variable Gaussian process (LVGP) models to create multi-response LVGP (MR-LVGP) models for the microstructure libraries of metamaterials, taking both qualitative microstructure concepts and quantitative microstructure design variables as mixed-variable inputs. The MR-LVGP model embeds the mixed variables into a continuous design space based on their collective effects on the responses, providing substantial insights into the interplay between different geometrical classes and material parameters of microstructures. With this model, we can easily obtain a continuous and differentiable transition between different microstructure concepts that can render gradient information for multiscale topology optimization. We demonstrate its benefits through multiscale topology optimization with aperiodic microstructures. Design examples reveal that considering multiclass microstructures can lead to improved performance due to the consistent load-transfer paths for micro- and macro-structures.
A sparse code increases the speed and efficiency of neuro-dynamic programming for optimal control tasks with correlated feature inputs
Sparse codes in neuroscience have been suggested to offer certain computational advantages over other neural representations of sensory data. To explore this viewpoint, a sparse code is used to represent natural images in an optimal control task solved with neuro-dynamic programming, and its computational properties are investigated. The central finding is that when feature inputs to a linear network are correlated, an over-complete sparse code increases the memory capacity of the network in an efficient manner beyond that possible for any complete code with the same-sized input, and also increases the speed of learning the network weights. A complete sparse code is found to maximise the memory capacity of a linear network by decorrelating its feature inputs to transform the design matrix of the least-squares problem to one of full rank. It also conditions the Hessian matrix of the least-squares problem, thereby increasing the rate of convergence to the optimal network weights. Other types of decorrelating codes would also achieve this. However, an over-complete sparse code is found to be approximately decorrelated, extracting a larger number of approximately decorrelated features from the same-sized input, allowing it to efficiently increase memory capacity beyond that possible for any complete code: a 2.25 times over-complete sparse code is shown to at least double memory capacity compared with a complete sparse code using the same input. This is used in sequential learning to store a potentially large number of optimal control tasks in the network, while catastrophic forgetting is avoided using a partitioned representation, yielding a cost-to-go function approximator that generalizes over the states in each partition. Sparse code advantages over dense codes and local codes are also discussed.
Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup
Kim, Jang-Hyun, Choo, Wonho, Song, Hyun Oh
While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating between the multi-label objective for optimal mixing mask and saliency discounted optimal transport objective. Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets. The source code is available at https://github.com/snu-mllab/PuzzleMix.
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations
Belakaria, Syrine, Deshwal, Aryan, Doppa, Janardhan Rao
Many real-world applications involve black-box optimization of multiple objectives using continuous function approximations that trade-off accuracy and resource cost of evaluation. For example, in rocket launching research, we need to find designs that trade-off return-time and angular distance using continuous-fidelity simulators (e.g., varying tolerance parameter to trade-off simulation time and accuracy) for design evaluations. The goal is to approximate the optimal Pareto set by minimizing the cost for evaluations. In this paper, we propose a novel approach referred to as information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations (iMOCA)} to solve this problem. The key idea is to select the sequence of input and function approximations for multiple objectives which maximize the information gain per unit cost for the optimal Pareto front. Our experiments on diverse synthetic and real-world benchmarks show that iMOCA significantly improves over existing single-fidelity methods.
Black-box Mixed-Variable Optimisation using a Surrogate Model that Satisfies Integer Constraints
Bliek, Laurens, Guijt, Arthur, Verwer, Sicco, de Weerdt, Mathijs
A challenging problem in both engineering and computer science is that of minimising a function for which we have no mathematical formulation available, that is expensive to evaluate, and that contains continuous and integer variables, for example in automatic algorithm configuration. Surrogate-based algorithms are very suitable for this type of problem, but most existing techniques are designed with only continuous or only discrete variables in mind. Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) is a surrogate-based algorithm that uses a linear combination of rectified linear units, defined in such a way that (local) optima satisfy the integer constraints. This method outperforms the state of the art on several synthetic benchmarks with up to 238 continuous and integer variables, and achieves competitive performance on two real-life benchmarks: XGBoost hyperparameter tuning and Electrostatic Precipitator optimisation.
METASET: Exploring Shape and Property Spaces for Data-Driven Metamaterials Design
Chan, Yu-Chin, Ahmed, Faez, Wang, Liwei, Chen, Wei
Data-driven design of mechanical metamaterials is an increasingly popular method to combat costly physical simulations and immense, often intractable, geometrical design spaces. Using a precomputed dataset of unit cells, a multiscale structure can be quickly filled via combinatorial search algorithms, and machine learning models can be trained to accelerate the process. However, the dependence on data induces a unique challenge: An imbalanced dataset containing more of certain shapes or physical properties can be detrimental to the efficacy of data-driven approaches. In answer, we posit that a smaller yet diverse set of unit cells leads to scalable search and unbiased learning. To select such subsets, we propose METASET, a methodology that 1) uses similarity metrics and positive semi-definite kernels to jointly measure the closeness of unit cells in both shape and property spaces, and 2) incorporates Determinantal Point Processes for efficient subset selection. Moreover, METASET allows the trade-off between shape and property diversity so that subsets can be tuned for various applications. Through the design of 2D metamaterials with target displacement profiles, we demonstrate that smaller, diverse subsets can indeed improve the search process as well as structural performance. By eliminating inherent overlaps in a dataset of 3D unit cells created with symmetry rules, we also illustrate that our flexible method can distill unique subsets regardless of the metric employed. Our diverse subsets are provided publicly for use by any designer.
Interpolating the Trace of the Inverse of Matrix $\mathbf{A} + t \mathbf{B}$
Ameli, Siavash, Shadden, Shawn C.
We develop heuristic interpolation methods for the function $t \mapsto \operatorname{trace}\left( (\mathbf{A} + t \mathbf{B})^{-1} \right)$, where the matrices $\mathbf{A}$ and $\mathbf{B}$ are symmetric and positive definite and $t$ is a real variable. This function is featured in many applications in statistics, machine learning, and computational physics. The presented interpolation functions are based on the modification of a sharp upper bound that we derive for this function, which is a new trace inequality for matrices. We demonstrate the accuracy and performance of the proposed method with numerical examples, namely, the marginal maximum likelihood estimation for linear Gaussian process regression and the estimation of the regularization parameter of ridge regression with the generalized cross-validation method.
MO-PaDGAN: Reparameterizing Engineering Designs for Augmented Multi-objective Optimization
Multi-objective optimization is key to solving many Engineering Design problems, where design parameters are optimized for several performance indicators. However, optimization results are highly dependent on how the designs are parameterized. Researchers have shown that deep generative models can learn compact design representations, providing a new way of parameterizing designs to achieve faster convergence and improved optimization performance. Despite their success in capturing complex distributions, existing generative models face three challenges when used for design problems: 1) generated designs have limited design space coverage, 2) the generator ignores design performance, and 3) the new parameterization is unable to represent designs beyond training data. To address these challenges, we propose MO-PaDGAN, which adds a Determinantal Point Processes based loss function to the generative adversarial network to simultaneously model diversity and (multi-variate) performance. MO-PaDGAN can thus improve the performances and coverage of generated designs, and even generate designs with performances exceeding those from training data. When using MO-PaDGAN as a new parameterization in multi-objective optimization, we can discover much better Pareto fronts even though the training data do not cover those Pareto fronts. In a real-world multi-objective airfoil design example, we demonstrate that MO-PaDGAN achieves, on average, a 186% improvement in the hypervolume indicator when compared to the vanilla GAN or other state-of-the-art parameterization methods.
Refined approachability algorithms and application to regret minimization with global costs
Blackwell's approachability is a framework where two players, the Decision Maker and the Environment, play a repeated game with vector-valued payoffs. The goal of the Decision Maker is to make the average payoff converge to a given set called the target. When this is indeed possible, simple algorithms which guarantee the convergence are known. This abstract tool was successfully used for the construction of optimal strategies in various repeated games, but also found several applications in online learning. By extending an approach proposed by Abernethy et al. (2011), we construct and analyze a class of Follow the Regularized Leader algorithms (FTRL) for Blackwell's approachability which are able to minimize not only the Euclidean distance to the target set (as it is often the case in the context of Blackwell's approachability) but a wide range of distance-like quantities. This flexibility enables us to apply these algorithms to minimize the exact quantity of interest in various online learning problems. In particular, for regret minimization with global costs, we obtain novel guarantees for general norm cost functions, and for the case of $\ell_p$ cost functions, we obtain the first regret bounds with explicit dependence in $p$ and the dimension $d$.