Goto

Collaborating Authors

 committee machine


Improving Generalization by Permutation Routing Across Model Copies

arXiv.org Machine Learning

We introduce a use of the \(M\)-cover (or \(M\)-layer) transform for machine learning. The method replicates a model \(M\) times, but instead of coupling the copies through parameter averaging or an explicit attractive force, as in replicated SGD or Elastic SGD, it rewires the contexts in which local learning messages are computed. Each local loss is evaluated on a routed model whose parameters are drawn from different copies according to permutations sampled from a structured mixing kernel \(Q\). Training then uses the original local update rule, while the resulting learning messages are redistributed across the copies through these routed computational paths. Thus \(Q\) defines a topology for message transport and controls the long-loop structure of the lifted factor graph. We formulate this construction for perceptrons, committee machines, and multilayer perceptrons, showing that the same principle applies from discrete models to differentiable neural networks. The resulting framework provides a mechanism for improving generalization through structured message sharing rather than replica collapse or parameter-space coupling.


The committee machine: Computational to statistical gaps in learning a two-layers neural network

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it; strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap.


The committee machine: Computational to statistical gaps in learning a two-layers neural network

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters.


The committee machine: Computational to statistical gaps in learning a two-layers neural network

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it; strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap.


The committee machine: Computational to statistical gaps in learning a two-layers neural network

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine.


Reviews: The committee machine: Computational to statistical gaps in learning a two-layers neural network

Neural Information Processing Systems

The committee machine is a simple and natural model for a 2-layer neural network. The results of the paper also apply to many related models: you are allowed an arbitrary function mapping the K hidden values to the final binary output.) This paper studies the problem of learning the weights W under a natural random model. We are given m random examples (X,Y) where the input X (in R n) is iid Gaussian and Y (in { 1,-1}) is the associated output of the network. The unknown weights W are iid from a known prior.


\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks

arXiv.org Machine Learning

We consider the memorization capabilities of multilayered \emph{sign} perceptrons neural networks (SPNNs). A recent rigorous upper-bounding capacity characterization, obtained in \cite{Stojnictcmspnncaprdt23} utilizing the Random Duality Theory (RDT), demonstrated that adding neurons in a network configuration may indeed be very beneficial. Moreover, for particular \emph{treelike committee machines} (TCM) architectures with $d\leq 5$ neurons in the hidden layer, \cite{Stojnictcmspnncaprdt23} made a very first mathematically rigorous progress in over 30 years by lowering the previously best known capacity bounds of \cite{MitchDurb89}. Here, we first establish that the RDT bounds from \cite{Stojnictcmspnncaprdt23} scale as $\sim \sqrt{d}$ and can not on their own \emph{universally} (over the entire range of $d$) beat the best known $\sim \log(d)$ scaling of the bounds from \cite{MitchDurb89}. After recognizing that the progress from \cite{Stojnictcmspnncaprdt23} is therefore promising, but yet without a complete concretization, we then proceed by considering the recently developed fully lifted RDT (fl RDT) as an alternative. While the fl RDT is indeed a powerful juggernaut, it typically relies on heavy numerical evaluations. To avoid such heavy numerics, we here focus on a simplified, \emph{partially lifted}, variant and show that it allows for very neat, closed form, analytical capacity characterizations. Moreover, we obtain the concrete capacity bounds that \emph{universally} improve for \emph{any} $d$ over the best known ones of \cite{MitchDurb89}.


Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds

arXiv.org Machine Learning

We study the capacity of \emph{sign} perceptrons neural networks (SPNN) and particularly focus on 1-hidden layer \emph{treelike committee machine} (TCM) architectures. Similarly to what happens in the case of a single perceptron neuron, it turns out that, in a statistical sense, the capacity of a corresponding multilayered network architecture consisting of multiple \emph{sign} perceptrons also undergoes the so-called phase transition (PT) phenomenon. This means: (i) for certain range of system parameters (size of data, number of neurons), the network can be properly trained to accurately memorize \emph{all} elements of the input dataset; and (ii) outside the region such a training does not exist. Clearly, determining the corresponding phase transition curve that separates these regions is an extraordinary task and among the most fundamental questions related to the performance of any network. Utilizing powerful mathematical engine called Random Duality Theory (RDT), we establish a generic framework for determining the upper bounds on the 1-hidden layer TCM SPNN capacity. Moreover, we do so for \emph{any} given (odd) number of neurons. We further show that the obtained results \emph{exactly} match the replica symmetry predictions of \cite{EKTVZ92,BHS92}, thereby proving that the statistical physics based results are not only nice estimates but also mathematically rigorous bounds as well. Moreover, for $d\leq 5$, we obtain the capacity values that improve on the best known rigorous ones of \cite{MitchDurb89}, thereby establishing a first, mathematically rigorous, progress in well over 30 years.


Statistical Mechanics of Learning in a Large Committee Machine

Neural Information Processing Systems

We use statistical mechanics to study generalization in large com(cid:173) mittee machines. For an architecture with nonoverlapping recep(cid:173) tive fields a replica calculation yields the generalization error in the limit of a large number of hidden units. For continuous weights the generalization error falls off asymptotically inversely proportional to Q, the number of training examples per weight. For binary weights we find a discontinuous transition from poor to perfect generalization followed by a wide region of metastability. Broken replica symmetry is found within this region at low temperatures.


Discontinuous Generalization in Large Committee Machines

Neural Information Processing Systems

The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights.