Goto

Collaborating Authors

 proposition4


The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin, Antoine Maillard, jean barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters.


Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Grant Rotskoff, Eric Vanden-Eijnden

Neural Information Processing Systems

Theperformance ofneural networksonhigh-dimensional datadistributions suggests that it may be possible to parameterize a representation of agiven highdimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the lossfunction.



AUnifiedAnalysisofFederatedLearningwith ArbitraryClientParticipation

Neural Information Processing Systems

The objective(1) can be extended to a weighted average, but we do not write out the weights and consider them as part ofℓn(x,ξ)andFn(x).


2063a00c435aafbcc58c16ce1e522139-Paper-Conference.pdf

Neural Information Processing Systems

Amongst those functions, the simplest are single-index modelsf(x) = ϕ(x θ), where the labels are generated by an arbitrary non-linear scalar link functionϕ applied to an unknown one-dimensional projectionθ of the input data.


45d74e190008c7bff2845ffc8e3facd3-Supplemental-Conference.pdf

Neural Information Processing Systems

In a typical supervised learning task, one is given a training dataset ofn N labeled samplesD = ((xi,yi) Rd R)i [n], and a parametric model withm N parameters, f:Rm Rd R. The task istofind parameters fitting the training data, i.e. findθ Rm such that i [n],f(θ;xi) yi.


OnUniformConvergence andLow-NormInterpolationLearning

Neural Information Processing Systems

Butweargue we can explain the consistencyof the minimal-norm interpolator with aslightly weaker, yet standard, notion: uniform convergenceof zero-error predictorsin a normball.


OnUniformConvergence andLow-NormInterpolationLearning

Neural Information Processing Systems

Butweargue we can explain the consistencyof the minimal-norm interpolator with aslightly weaker, yet standard, notion: uniform convergenceof zero-error predictorsin a normball.


Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments

Engstrøm, Ole-Christian Galbo

arXiv.org Artificial Intelligence

Cross-validation is a widely used technique for assessing the performance of predictive models on unseen data. Many predictive models, such as Kernel-Based Partial Least-Squares (PLS) models, require the computation of $\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$ using only training set samples from the input and output matrices, $\mathbf{X}$ and $\mathbf{Y}$, respectively. In this work, we present three algorithms that efficiently compute these matrices. The first one allows no column-wise preprocessing. The second one allows column-wise centering around the training set means. The third one allows column-wise centering and column-wise scaling around the training set means and standard deviations. Demonstrating correctness and superior computational complexity, they offer significant cross-validation speedup compared with straight-forward cross-validation and previous work on fast cross-validation - all without data leakage. Their suitability for parallelization is highlighted with an open-source Python implementation combining our algorithms with Improved Kernel PLS.