lengthscale
Rethinking Trust Region Bayesian Optimization in High Dimensions
Tang, Wei-Ting, Paulson, Joel A.
Trust Region Bayesian Optimization (TuRBO) is an effective strategy for alleviating the curse of dimensionality in high-dimensional black-box optimization. However, inappropriate lengthscale design can cause the local Gaussian process (GP) model within the trust region to degenerate, leading to suboptimal performance in high dimensions. In this work, we show that TuRBO's local GP may remain either excessively complex or overly simple as the dimension $D$ and trust region side length $L$ vary. To address this issue, we propose a straightforward variant, AdaScale-TuRBO, which scales the GP lengthscale with both the problem dimension and trust region size, thereby preserving kernel geometry and maintaining consistent prior complexity. Empirically, we show that AdaScale-TuRBO can robustly outperform standard TuRBO and other popular high-dimensional BO methods on synthetic benchmarks and real-world trajectory planning tasks.
Supplementary Material for Kernel Identification Through Transformers ABackground: Self-Attention
Since the attention mechanism is rarely used within the GP literature, we provide a brief review of the topic in this section. Below we follow the description of attention as given by Vaswani et al. [8], including extensions to self-attention and multi-head self-attention. The dot-product attention mechanism [8] takes as input a set of queries, keys and values. The queries and keys have dimension Dz and the values have dimension Dv which may differ from Dz. The operation of dot-product attention then generates weights from the queries and keys which are used to produce a linear mapping of the input values.
240225294cdd2c9b692c2519d3278a08-Supplemental-Conference.pdf
By minimising off-target activation, Bayesian target optimisation could enable (e.g.)421 more precise synaptic connectivity mapping, improving our understanding of neural circuitry. This422 advancement has potential implications for understanding brain disorders like epilepsy, where423 abnormal synaptic connections are central to seizure generation and propagation. Deepening our424 understanding of these diseases can lead to enhanced targeted interventions and more effective425 therapeutic strategies, benefiting individuals with neurological disorders.426 First, we develop431 the approach for single optogenetic targets, as this is most closely related to existing GP-based432 receptive field inference techniques. We use a GP-Bernoulli approach to model the response ynt of436 neuron n on trial t to a single-target stimulus xt,437 ynt Bernoulli( (gn(xt))), (9) where the stimulus xt =( c1t,c2t,It) 2 R3 represents the two-dimensional coordinates and laser438 power of the t-th hologram.
Estimator
Observationso = δx are sampled with uniform distribution onx U[ 1,3](shown in blue) ˆfλ is calculated 500 times for different realizations of the training data (10 example predictors are shown in dashed lines), its mean and 2 standard deviation are shown in red. The true function f (x) = x2 +2cos(4x)is shown in black. Preliminary: Big-Pnotation Throughout our proofs, we will frequently rely on a polynomial analogue of the big-O notation, whichwecallbig-P: Definition1. Let us observe that all the quantities we study (the predictor, the risk and empirical risk) stay the sameifanyobservation oi isreplacedby oi. The existence and the uniqueness of the solution in the cone spanned by1and 1/z of theequation canbeargued asfollows.
6 Supplementary material 410 6.1 Animal ethics statement 411 All experiments on animals were conducted with approval of the Animal Care and Use Committee of 412 the University of California, Berkeley
All computational procedures were performed either on a desktop workstation running Ubuntu 18.04 By minimising off-target activation, Bayesian target optimisation could enable (e.g.) Here we provide further mathematical details for optimising holographic stimuli. Next we must evaluate the partial derivative on the right-hand side of Equation 13. The covariance between a GP and its derivative is given by [40, Sec 9.4] Simulations consisted of both ORF mapping and stimulus optimisation phases. For reference, a typical ORF mean function is given in Figure S2.