492114f6915a69aa3dd005aa4233ef51-Supplemental.pdf

Neural Information Processing Systems 

A deterministic path uses a self-attention and cross-attention to summarize contexts. B.1 1DRegression Architectures For models without attention (CNP, NP, BNP), we set`pre = 4,`post = 2,`dec = 3,dh = 128. For NP we set dz = 128. For Student-t noise, we addedε γ T(2.1) to the curves generated from GP with RBF kernel, whereT(2.1) is a Student'st distribution with degree of freedom2.1 and γ Unif(0,0.15). After realizing them, the prior functions are used to optimize via Bayesian optimization.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found