Goto

Collaborating Authors

 length scale


Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

Neural Information Processing Systems

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence.




Appendix

Neural Information Processing Systems

Fitting T1-mGPLVM to the binned spike data, we found that the inferred latent state was highly correlated with the true head direction (Figure 5b). Here we make this connection more explicit. As described in the main text, the Lie algebrag of a groupG is a vector space tangent toG at its identity element. However,because the Lie algebra is isomorphic toRn, we have found it convenient in both our exposition and our implementation to work directly with the pair(Rn,ExpG), instead of(g,expG). We begin by noting thatSn is not a Lie group unlessn = 1 or n = 3, thus we can only apply the ReLie framework toS1 and S3.


7 Supplementary Material

Neural Information Processing Systems

The sample explanatory features were fed into a multi-layer perceptron, then the learned latent features and sample spatial locations were fed into a Gaussian process model. GP variance is used as the uncertainty measure. We first constructed a spatial graph based on each sample's k-nearest-neighbor by spatial distance. The model contains two GCN layers. It contains a multi-level graph neural network to capture the long-range interactions among particles with linear complexity.


We would like to thank the reviewers for taking the time to provide us with helpful feedback and will definitely

Neural Information Processing Systems

Below are our clarifications for the questions raised. The stability of the factorization under other kernels will be important to study in future work. Therefore, it may not make sense to compare LFP channels of different rats. In Section 3.3, the horseshoe prior is used on the loadings as an illustration of the methodology when The effective sample size is 276 (median) for the loadings but falls short for the length scales. Bayesian hypothesis testing in general and the null hypothesis formulation in this case are not well-defined.



An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications

arXiv.org Machine Learning

Learning from non-independent and non-identically distributed data poses a persistent challenge in statistical learning. In this study, we introduce data-dependent Bernstein inequalities tailored for vector-valued processes in Hilbert space. Our inequalities apply to both stationary and non-stationary processes and exploit the potential rapid decay of correlations between temporally separated variables to improve estimation. We demonstrate the utility of these bounds by applying them to covariance operator estimation in the Hilbert-Schmidt norm and to operator learning in dynamical systems, achieving novel risk bounds. Finally, we perform numerical experiments to illustrate the practical implications of these bounds in both contexts.


Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

Neural Information Processing Systems

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence.


Data-driven Approach for Interpolation of Sparse Data

arXiv.org Machine Learning

Extracting information about hadron resonances requires fitting theoretical models to experimental data. However, this data often comes from different experiments of different physics quantities in varying kinematic regions; studying coupled channels with different kinematic coverages and binning can make direct comparison challenging. The consistency of these datasets directly impacts the quality of the fit, thus making it difficult to accurately constrain the theoretical models. Sparse datasets in key kinematic regions further complicates the quantification of uncertainties, often requiring arbitrary weighting that may introduce bias. A robust approach to solving these problems involves utilising Gaussian Processes (GPs), a Bayesian inference machine learning technique that provides probabilistic predictions for unknown datapoints. Unlike traditional machine learning methods, GPs do not require any training; instead, they operate on three fundamental assumptions: 1. Some kernel function can be defined to measure the covariance between known datapoints; 2. This same kernel function can be used to predict the covariance between unknown datapoints; 3. Some idea of the form of the posterior distribution is known (e.g.