Goto

Collaborating Authors

 zt 1


Practical Efficient Global Optimization is No-regret

Wang, Jingyi, Wang, Haowei, Chiang, Nai-Yuan, Mueller, Juliane, Hartland, Tucker, Petra, Cosmin G.

arXiv.org Machine Learning

Efficient global optimization (EGO) is one of the most widely used noise-free Bayesian optimization algorithms.It comprises the Gaussian process (GP) surrogate model and expected improvement (EI) acquisition function. In practice, when EGO is applied, a scalar matrix of a small positive value (also called a nugget or jitter) is usually added to the covariance matrix of the deterministic GP to improve numerical stability. We refer to this EGO with a positive nugget as the practical EGO. Despite its wide adoption and empirical success, to date, cumulative regret bounds for practical EGO have yet to be established. In this paper, we present for the first time the cumulative regret upper bound of practical EGO. In particular, we show that practical EGO has sublinear cumulative regret bounds and thus is a no-regret algorithm for commonly used kernels including the squared exponential (SE) and Matérn kernels ($ν>\frac{1}{2}$). Moreover, we analyze the effect of the nugget on the regret bound and discuss the theoretical implication on its choice. Numerical experiments are conducted to support and validate our findings.



DeepExplicitDurationSwitchingModels forTimeSeries

Neural Information Processing Systems

Time series forecasting plays akeyrole in informing industrial and business decisions [17,24,8], while segmentation isuseful forunderstanding biological andphysicalsystems [40,45,34].



LatentTemplateInductionwithGumbel-CRFs Appendix

Neural Information Processing Systems

Papandreou and Yuille[4] proposed the Perturb-and-MAP Random Field, an efficient sampling method forgeneral MarkovRandom Field. We compare the detailed structure of gradients of each estimator. All gradients are formed as a summation over the steps. The Gumbel-CRF and PM-MRF estimator can be decomposed with a pathwise term, where we take gradientoff w.r.t. Since the official test set is not publically available, we use the same training/ validation/ test split as Fu et al.[1].





min

Neural Information Processing Systems

Recall thatx = argmina Ax>θ so x can be viewed as a deterministic functionθ . " log p(zn|θ) (1/|Nε|) P Since Rmax is the upper bound of maximum expected reward, the second term can be bounded 2Rmaxγ. We letΦ R|A| d as the feature matrix where each row ofΦrepresent each action inA. We summarize the procedure of estimating t,It inAlgorithm3. LetX denote the feasible set.


59112692262234e3fad47fa8eabf03a4-Paper.pdf

Neural Information Processing Systems

However,extrinsic rewards may be insufficiently informative to encourage an agent to explore and understand its environment, particularly in partially observed settings where the agent has a limited view of its environment.