Goto

Collaborating Authors

 max




Appendix

Neural Information Processing Systems

In this section, we present some additional experiments. Empirical setup Most of the experimental setups are the same as those in Section 6, except that now we use 5 parties instead of 3 parties. There are 90 dimensions for a single data in YearPredictionMSD dataset, and we let each party hold 18 dimensions. Empirical results We plot the training loss instead of the testing loss since we are comparing differentobjectivefunctions. A.4 Experimentsonotherdatasets In this section, we present the experiment results on another dataset.


Supplementary Informationfor: FastMatrixSquare RootswithApplicationstoGaussianProcessesand BayesianOptimization

Neural Information Processing Systems

We note that all methods incur some sampling error, regardless of the subset size (N). In Fig. S6 we plot the learned hyperparameters of the Precipitation SVGP models: 1)o2 (the kernel outputscale)--which roughly corresponds to variance explained as "signal" in the data; 2)σ2obs--which roughly corresponds to variance explained away as observational noise; and 3)ν (degreesoffreedom)--which controls thetailsofthenoisemodel (lowerν corresponds toheavier tails). As M increases, we find that the observational noise parameter decreases by a factor of 4--downfrom 0.19to0.05--whilethe Fig. S7 is a histogram displaying the msMINRES iterations needed to achieve a relative residual of10 3 when training aM = 5,000SVGP model on the 3droad dataset (subsampled to30,000 datapoints). AsM increases, the kernel outputscale (left) also increases.



7a006957be65e608e863301eb98e1808-Supplemental.pdf

Neural Information Processing Systems

In Appendix A, we review some statistical results for sparse linear regression. We review some classical results in sparse linear regression. Let the design matrix beX = (x1,...,xn)> Rn d. Second, we derive a regret lower bound of alternative banditeθ.



Estimatingtheintrinsicdimensionalityusing NormalizingFlows-Supplementary

Neural Information Processing Systems

Withtheseconditions,adirectconsequenceisthat the singular values inon-manifold directions will not depend onσ2. Hence, if we fix the latent distribution to be standard Gaussian, wehavethat theNFused tolearnqσ2 must be f forall(u,v),i.e. However, these eigenvalues are exactly in direction of large variability, i.e. in on-manifolddirection. Thiswastobeshown. Let us assume thatσ21 = = σ2d in the following. B.1 Lolipop In [11], a manifold consisting of regions of different ID was considered - a 1 dimensional line segment, and atwodimensional disk such that theoverall manfiold resembles alolipop.



4aaa76178f8567e05c8e8295c96171d8-AuthorFeedback.pdf

Neural Information Processing Systems

Gradient descent: Asillustrated byR1'sexample off(x),ourcorrectness condition forautodiffsystems doesnot12 necessarily imply the correctness of the gradient descent based on those systems (i.e., that the gradient descent13 converges to Clarke critical points). This gives a partial answer to R3's question on possible drawbacks of using14 intensionalderivatives. This is a good question that would lead to interesting future work.