Statistical Learning
Appendix for Self-Weighted Contrastive Learning among Multiple Views for Mitigating Representation Degeneration
We provide supplementary materials for the submission of Self-Weighted Contrastive Learning among Multiple Views for Mitigating Representation Degeneration. Specifically, Appendix A (Page1) shows all theoretical proofs and complexity analysis of SEM; Appendix B (Page-7) includes the settings in experiments; Appendix C (Page-8) lists additional experimental results and provides more experimental analysis, which are not shown in the paper due to space; Appendix D (Page-10) discusses the limitations and future work of this paper. The code implementation, trained models, and datasets used in our method are provided in https://github.com/SubmissionsIn/SEM. I(Xv;Hv), (8) where Wm,n > 0 as two views (v {m,n}) are with positive class mutual information. Therefore, if Hv is the tv-th layer's features (i.e., Hv(tv) act as the regularized hidden features), we have I(S;Zv) I(S;Xv) This design aims at separately maintaining different views' discriminative information by {Hv}Vv=1 and exploring their common semantic information by {Zv}Vv=1.
Projection-Free Online Convex Optimization via Efficient Newton Iterations
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain K Rd. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set K to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap potentially-expensive Euclidean projections onto Kfor linear optimization over K. However, such algorithms have a sub-optimal regret in OCO compared to projection-based algorithms. In this paper, we look at a third type of algorithms that output approximate Newton iterates using a self-concordant barrier for the set of interest. The use of a self-concordant barrier automatically ensures feasibility without the need of projections. However, the computation of the Newton iterates requires a matrix inverse, which can still be expensive. As our main contribution, we show how the stability of the Newton iterates can be leveraged to only compute the inverse Hessian a vanishing fractions of the rounds, leading to a new efficient projection-free OCO algorithm with a state-of-the-art regret bound.
Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models
We focus on the task of learning a single index model σ(w x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w is governed by the information exponent k of the link function σ, which is defined as the index of the first nonzero Hermite coefficient of σ.