Goto

Collaborating Authors

 dlog



f3d9de86462c28781cbe5c47ef22c3e5-Supplemental.pdf

Neural Information Processing Systems

The algorithm [62] consider Algorithm 2 for the stochastic generalized linear bandit problem. Assume thatθ is the true parameter of the reward model. Then we consider the lower bounds. For fj(A) = 12(ej1eTj2 +ej2eTj1),A with j1 j2, fj(Ai) is only 1 wheni = j and 0 otherwise. With Claim D.12 and Claim D.11 we get that g C q To get 1), we writeVl = [v1, vl] Rd l and V l = [vl+1, vk].


sup

Neural Information Processing Systems

In the deterministic setting where the data is deterministically given without any probabilistic assumptions, significant advances inDP linear regression has been made [77,57,68, 16, 7, 83, 31, 67, 82, 71]. In the randomized settings where each example{xi,yi} is drawn i.i.d. We explain the closely related ones in Section 2.3, with analysis when the covariance matrixhasaspectralgap. The resulting utility guarantees are the same as those from [23], which are discussedinSection2.3. When privacy is not required, we know from Theorem 2.2 that under Assumptions A.1-A.3, we can achieve an error rate of O(κ p V/n).



ky Xvk

Neural Information Processing Systems

Wefocusonsixmethods:(i)discriminative K-means (DisKmeans) in Ye et al. (2008); (ii) a discriminative clustering formulation described inBach andHarchaoui (2008); Flammarion etal.(2017); We compare two classesF of feature mappings: linear functions and fully-connected neural networks with one hidden layer that has 100 nodes. An epoch refers ton/B = 12 consecutive iterations. The learning curves in Figure 1 shows the advantage of neural network and demonstrates the flexibility of CURE with nonlinear function classes. One of the main obstacles is the complicated piecewise definition off, which prevent us from obtaining closed form formulae.


SupplementaryMaterial

Neural Information Processing Systems

This is the appendix for "A general approximation lower bound inLp norm, with applications to feed-forwardneuralnetworks". Layer L consists of a single node: the output neuron. Note that skip connections are allowed, i.e., there can be connections between non-consecutivelayers. We now explain how to derive Proposition 1 (with an arbitrary range[a,b]) as a straightforward consequenceofProposition7. Proof(ofProposition1). In order to apply Proposition 7, we reduce the problem from[a,b] to [0,1] by translating and rescaling every function inG.





PrivateNon-smoothERMandSCOinSubquadratic Steps

Neural Information Processing Systems

We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions.