Goto

Collaborating Authors

 bzi


Label consistency in overfitted generalized k-means

Neural Information Processing Systems

We provide theoretical guarantees for label consistency in generalized k-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the k-means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems.



SupplementaryMaterialfor: AdversarialRegression withDoubly Non-negativeWeightingMatrices

Neural Information Processing Systems

A.1 ProofsofSection3 In the following, the symbolh, i will be used to represent both Frobenius norm of matrices and standard Euclidean norm of vectors. For the second part, letv be an eigenvector ofA corresponding to eigenvalueλmax(A). Incase the maximum eigenvalue ofT isnonpositive, then from Lemma A.1 we see that the objectivevalue of problem(A.2)evaluated For anp preal matrixA, its spectral radiusR(A)is defined as the largest absolute value of its eigenvalues. Then the matrixI A is invertible and all entries of(I A) 1 are nonnegative. Also the spectral radius of(γ?) 1bΩ12V(β)bΩ12 is smaller than1 by the feasibility ofγ? in problem (A.5c).