Goto

Collaborating Authors

 theproofiscomplete


min

Neural Information Processing Systems

LetAbean nHermitian matrixandletBbea(n 1) (n 1)matrixwhich is constructed by deleting thei-th row andi-th column ofA. Denote thatΦ = [ϕ(x1),...,ϕ(xn)] Rn D, where D is the dimension of feature spaceH. Performing rank-n singular value decomposition (SVD) onΦ, we have Φ = HΣV, where H Rn n, Σ Rn n is a diagonal matrix whose diagonal elements are the singular values of Φ,andV RD n. F(α) in Eq.(21) is proven differentiable and thep-th component of the gradient is F(α) αp = Then, a reduced gradient descent algorithm [26] is adopted to optimize Eq.(21). The three deep neural networks are pre-trained on the ImageNet[5].


PairwiseLearning

Neural Information Processing Systems

Thefollowing lemma provides moment bounds for a summation of weakly dependent and mean-zero random functions withbounded increments underachange ofanysinglecoordinate [1,10]. The stated bound then follows by combining the above two inequalities together. Note A(S0) is independent ofS and can be considered as a fixed model if we only consider the randomness induced fromS. In this section, we present the proof related to stability and generalization for pairwise learning with convex and smooth loss functions. For anyi [n], define Si as (3.3).