Goto

Collaborating Authors

 lengthscale





Estimator

Neural Information Processing Systems

Observationso = δx are sampled with uniform distribution onx U[ 1,3](shown in blue) ˆfλ is calculated 500 times for different realizations of the training data (10 example predictors are shown in dashed lines), its mean and 2 standard deviation are shown in red. The true function f (x) = x2 +2cos(4x)is shown in black. Preliminary: Big-Pnotation Throughout our proofs, we will frequently rely on a polynomial analogue of the big-O notation, whichwecallbig-P: Definition1. Let us observe that all the quantities we study (the predictor, the risk and empirical risk) stay the sameifanyobservation oi isreplacedby oi. The existence and the uniqueness of the solution in the cone spanned by1and 1/z of theequation canbeargued asfollows.


6 Supplementary material 410 6.1 Animal ethics statement 411 All experiments on animals were conducted with approval of the Animal Care and Use Committee of 412 the University of California, Berkeley

Neural Information Processing Systems

All computational procedures were performed either on a desktop workstation running Ubuntu 18.04 By minimising off-target activation, Bayesian target optimisation could enable (e.g.) Here we provide further mathematical details for optimising holographic stimuli. Next we must evaluate the partial derivative on the right-hand side of Equation 13. The covariance between a GP and its derivative is given by [40, Sec 9.4] Simulations consisted of both ORF mapping and stimulus optimisation phases. For reference, a typical ORF mean function is given in Figure S2.




Thin and deep Gaussian processes

Neural Information Processing Systems

Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values.However, selecting an appropriate kernel can be challenging.Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data.Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability.This work proposes a novel synthesis of both previous approaches: {Thin and Deep GP} (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations.We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.


We Still Don't Understand High-Dimensional Bayesian Optimization

Doumont, Colin, Fan, Donney, Maus, Natalie, Gardner, Jacob R., Moss, Henry, Pleiss, Geoff

arXiv.org Machine Learning

High-dimensional spaces have challenged Bayesian optimization (BO). Existing methods aim to overcome this so-called curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly, we demonstrate that these approaches are outperformed by arguably the simplest method imaginable: Bayesian linear regression. After applying a geometric transformation to avoid boundary-seeking behavior, Gaussian processes with linear kernels match state-of-the-art performance on tasks with 60- to 6,000-dimensional search spaces. Linear models offer numerous advantages over their non-parametric counterparts: they afford closed-form sampling and their computation scales linearly with data, a fact we exploit on molecular optimization tasks with > 20,000 observations. Coupled with empirical analyses, our results suggest the need to depart from past intuitions about BO methods in high-dimensional spaces.