vershynin
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
ky Xvk
Wefocusonsixmethods:(i)discriminative K-means (DisKmeans) in Ye et al. (2008); (ii) a discriminative clustering formulation described inBach andHarchaoui (2008); Flammarion etal.(2017); We compare two classesF of feature mappings: linear functions and fully-connected neural networks with one hidden layer that has 100 nodes. An epoch refers ton/B = 12 consecutive iterations. The learning curves in Figure 1 shows the advantage of neural network and demonstrates the flexibility of CURE with nonlinear function classes. One of the main obstacles is the complicated piecewise definition off, which prevent us from obtaining closed form formulae.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- Asia > China (0.04)
69dd2eff9b6a421d5ce262b093bdab23-Paper.pdf
Quite interestingly,modern deep networks havebeen known to be powerful enough to interpolate even randomized labels (Zhang et al., 2017; Liu et al., 2020), a phenomenon that is usually referred to asmemorization(Yunetal.,2019; Vershynin,2020;Bubeck etal.,2020), In this work, we lift the spherical requirement and offer an exponential improvementonthedependenceonδ.
- Europe > France (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > Canada (0.04)
31784d9fc1fa0d25d04eae50ac9bf787-Paper.pdf
Indeedin learning applications, where symmetric tensors areformed from statistical moments (higher-order covariances) or multivariate derivatives (higher-order Hessians), CP decomposition has enabled parameter estimation for mixtures of Gaussians [20, 35], generalized linear models [34], shallow neuralnetworks[19,24,42],deepernetworks[17,18,30],hiddenMarkovmodels[5],amongothers.
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
Kovačević, Filip, Ji, Hong Chang, Wu, Denny, Soltanolkotabi, Mahdi, Mondelli, Marco
It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of full-batch gradient descent (GD, which always reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) remains unclear. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.
- North America > United States > California (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Inverse Mixed-Integer Programming: Learning Constraints then Objective Functions
In mixed-integer linear programming, data-driven inverse optimization that learns the objective function and the constraints from observed data plays an important role in constructing appropriate mathematical models for various fields, including power systems and scheduling. However, to the best of our knowledge, there is no known method for learning both the objective functions and the constraints. In this paper, we propose a two-stage method for a class of problems where the objective function is expressed as a linear combination of functions and the constraints are represented by functions and thresholds. Specifically, our method first learns the constraints and then learns the objective function. On the theoretical side, we show the proposed method can solve inverse optimization problems in finite dataset, develop statistical learning theory in pseudometric spaces and sub-Gaussian distributions, and construct a statistical learning for inverse optimization. On the experimental side, we demonstrate that our method is practically applicable for scheduling problems formulated as integer linear programmings with up to 100 decision variables, which are typical in real-world settings.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > France (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)