Binois, Mickael
Gearing Gaussian process modeling and sequential design towards stochastic simulators
Binois, Mickael, Fadikar, Arindam, Stevens, Abby
Accurately reproducing real-world dynamics often requires stochastic simulators, particularly in fields like epidemiology, operations research, and hyperparameter tuning. In these contexts it becomes important to distinguish between aleatoric uncertainty - arising from noise in observations, from epistemic uncertainty - stemming from uncertainty in the model. The former is sometimes called intrinsic uncertainty while the latter is referred to as extrinsic uncertainty, see e.g., Ankenman et al. (2010). Gaussian process (GP) based surrogate methods (see, e.g., Rasmussen and Williams (2006); Gramacy (2020)) can be easily adapted from deterministic to noisy settings while maintaining strong predictive power, computational efficiency, and analytical tractability. Even in the deterministic setup, it is common to add a small diagonal nugget (also known as a jitter) term to the covariance matrix of the GP equations to ease its numerical inversion. It is also interpreted as a regularization term, especially in the reproducing kernel Hilbert space (RKHS) context, see, e.g., Kanagawa et al. (2018). This can be contrasted to the use of pseudo-inverses, which reverts to interpolation, see for instance the discussion by Mohammadi et al. (2016). Here we will prefer the term noise variance to relate it to intrinsic uncertainty, and also because the nugget effect has a different meaning in the kriging literature (see e.g., Roustant et al. (2012)).
Combining additivity and active subspaces for high-dimensional Gaussian process modeling
Binois, Mickael, Picheny, Victor
Gaussian processes are a widely embraced technique for regression and classification due to their good prediction accuracy, analytical tractability and built-in capabilities for uncertainty quantification. However, they suffer from the curse of dimensionality whenever the number of variables increases. This challenge is generally addressed by assuming additional structure in theproblem, the preferred options being either additivity or low intrinsic dimensionality. Our contribution for high-dimensional Gaussian process modeling is to combine them with a multi-fidelity strategy, showcasing the advantages through experiments on synthetic functions and datasets.
Shared active subspace for multivariate vector-valued functions
Musayeva, Khadija, Binois, Mickael
Many problems in machine learning, optimization, uncertainty quantification and sensitivity analysis suffer from the curse of dimensionality, where the performance and the complexity of the model worsens dramatically with the number of input variables. To alleviate this problem, one is interested in dimensionality reduction techniques. For instance, in machine learning, variable/feature selection methods Guyon and Elisseeff (2003) aim to find a subset of variables so as to improve the predictive performance of a learning algorithm, and in some algorithms, such as decision trees, the variable selection is an inherent part of the learning process. The field of sensitivity analysis mostly deals with identifying the subset of inputs parameters whose uncertainty contributes significantly to that of the model output Saltelli et al. (2008); Da Veiga et al. (2021). They are focused on the effects of the initial variables and their interactions. However, it might be the case that the model or function of interest varies the most along directions not aligned with the coordinate axes. The widely used dimensionality reduction method of principal component analysis (PCA) (also Karhunen-Loeve method) can be used to find a linear subspace of the input/output space containing the most of its variance, but, by default, it does not take into account the input-output relationship. In ecological sciences, the redundancy analysis applies PCA to the fitted values from a linear regression model to identify a subset of input parameters contributing significantly to the variation in the response matrix Legendre et al. (2011).
Trajectory-oriented optimization of stochastic epidemiological models
Fadikar, Arindam, Binois, Mickael, Collier, Nicholson, Stevens, Abby, Toh, Kok Ben, Ozik, Jonathan
Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). With the goal of finding not only the input parameter settings but also the random seeds that are consistent with the ground truth, we propose a class of Gaussian process (GP) surrogates along with an optimization strategy based on Thompson sampling. This Trajectory Oriented Optimization (TOO) approach produces actual trajectories close to the empirical observations instead of a set of parameter settings where only the mean simulation behavior matches with the ground truth.
A portfolio approach to massively parallel Bayesian optimization
Binois, Mickael, Collier, Nicholson, Ozik, Jonathan
One way to reduce the time of conducting optimization studies is to evaluate designs in parallel rather than just one-at-a-time. For expensive-to-evaluate black-boxes, batch versions of Bayesian optimization have been proposed. They work by building a surrogate model of the black-box that can be used to select the designs to evaluate efficiently via an infill criterion. Still, with higher levels of parallelization becoming available, the strategies that work for a few tens of parallel evaluations become limiting, in particular due to the complexity of selecting more evaluations. It is even more crucial when the black-box is noisy, necessitating more evaluations as well as repeating experiments. Here we propose a scalable strategy that can keep up with massive batching natively, focused on the exploration/exploitation trade-off and a portfolio allocation. We compare the approach with related methods on deterministic and noisy functions, for mono and multiobjective optimization tasks. These experiments show similar or better performance than existing methods, while being orders of magnitude faster.
Sequential Learning of Active Subspaces
Wycoff, Nathan, Binois, Mickael, Wild, Stefan M.
In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is employed, on which finite differencing is performed. When the surrogate model is a Gaussian process, we show that the ASM estimator is available in closed form, rendering the finite-difference approximation unnecessary. We use our closed-form solution to develop acquisition functions focused on sequential learning tailored to sensitivity analysis on top of ASMs. We also show that the traditional ASM estimator may be viewed as a method of moments estimator for a certain class of Gaussian processes. We demonstrate how uncertainty on Gaussian process hyperparameters may be propagated to uncertainty on the sensitivity analysis, allowing model-based confidence intervals on the active subspace. Our methodological developments are illustrated on several examples.
A Bayesian optimization approach to find Nash equilibria
Picheny, Victor, Binois, Mickael, Habbal, Abderrahmane
Game theory finds nowadays a broad range of applications in engineering and machine learning. However, in a derivative-free, expensive black-box context, very few algorithmic solutions are available to find game equilibria. Here, we propose a novel Gaussian-process based approach for solving games in this context. We follow a classical Bayesian optimization framework, with sequential sampling decisions based on acquisition functions. Two strategies are proposed, based either on the probability of achieving equilibrium or on the Stepwise Uncertainty Reduction paradigm. Practical and numerical aspects are discussed in order to enhance the scalability and reduce computation time. Our approach is evaluated on several synthetic game problems with varying number of players and decision space dimensions. We show that equilibria can be found reliably for a fraction of the cost (in terms of black-box evaluations) compared to classical, derivative-based algorithms. The method is available in the R package GPGame available on CRAN at https://cran.r-project.org/package=GPGame.