Goto

Collaborating Authors

 Education



Spectral embedding for dynamic networks with stability guarantees

Neural Information Processing Systems

We consider the problem of embedding a dynamic network, to obtain time-evolving vector representations of each node, which can then be used to describe changes in behaviour of individual nodes, communities, or the entire graph. Given this open-ended remit, we argue that two types of stability in the spatio-temporal positioning of nodes are desirable: to assign the same position, up to noise, to nodes behaving similarly at a given time (cross-sectional stability) and a constant position, up to noise, to a single node behaving similarly across different times (longitudinal stability). Similarity in behaviour is defined formally using notions of exchangeability under a dynamic latent position network model. By showing how this model can be recast as a multilayer random dot product graph, we demonstrate that unfolded adjacency spectral embedding satisfies both stability conditions. We also show how two alternative methods, omnibus and independent spectral embedding, alternately lack one or the other form of stability.


An Online Method for AClass of Distributionally Robust Optimization with Non-Convex Objectives

Neural Information Processing Systems

In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the minmax problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-ลojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.


Appendices ALow-Rank Matrix Factorization with Non-Uniform Sampling

Neural Information Processing Systems

In this section, we demonstrate the effectiveness of low-rank matrix factorization in recovering the label relationship matrix. We first present four important facts: f1: the rank of the matrix is equivalent to the number of classes. Specifically, this also means that if ห†Zi,k = 1, then ห†Zj,k = 1. We consider a toy example (without self-loops), ห†Z = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A = 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 (14) In a standard LRMF problem, it is not possible to recover ห†Z from A since no entries are observed for the third and fourth rows. However, we can demonstrate how LRMF effectively performs in this situation. Recovery: We begin by assuming v1 is in class 1, resulting in U1,: = [1, 1, 1] and V1,: = [1,0,0]. By observing A1,4, we know that v4 is also in class 1, resulting in U4,: = [1, 1, 1]and V4,: = [1,0,0](f2). By analyzing A1,2 and A1,3, we determine that v2 and v3 do not belong to class 1.





Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Neural Information Processing Systems

Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using'convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent ฮฒ (that relates the test error t P ฮฒ to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher tis smaller than that of the student s, ฮฒ is a function of s only and does not depend on the input dimension. We confirm our predictions on ฮฒ empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.



Bandit Social Learning under Myopic Behavior

Neural Information Processing Systems

We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals for the arms' expected rewards. We derive stark exploration failures for any such behavior, and provide matching positive results. As a special case, we obtain the first general results on failure of the greedy algorithm in bandits, thus providing a theoretical foundation for why bandit algorithms should explore.1