initialization
Learning Infinite RBMs with Frank-Wolfe
In this work, we propose an infinite restricted Boltzmann machine (RBM), whose maximum likelihood estimation (MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.
Mixed Linear Regression with Multiple Components
In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.
Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences
We provide two fundamental results on the population (infinite-sample) likelihood function of Gaussian mixture models with $M \geq 3$ components. Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. We prove that the log-likelihood value of these bad local maxima can be arbitrarily worse than that of any global optimum, thereby resolving an open question of Srebro (2007). Our second main result shows that the EM algorithm (or a first-order variant of it) with random initialization will converge to bad critical points with probability at least $1-e^{-\Omega(M)}$. We further establish that a first-order variant of EM will not converge to strict saddle points almost surely, indicating that the poor performance of the first-order method can be attributed to the existence of bad local maxima rather than bad saddle points. Overall, our results highlight the necessity of careful initialization when using the EM algorithm in practice, even when applied in highly favorable settings.
Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues
Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Initialization-Aware Score-Based Diffusion Sampling
Fassina, Tiziano, Cardoso, Gabriel, Corff, Sylvan Le, Romary, Thomas
Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of discretization steps and high computational cost. In this work, we present a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers that highlights the critical role of the backward process initialization. Based on this result, we propose a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error. The resulting procedure is independent of the specific score training procedure, network architecture, and discretization scheme. Experiments on toy distributions and benchmark datasets demonstrate competitive or improved generative quality while using significantly fewer sampling steps.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare
Miller, Elizabeth W., Blume, Jeffrey D.
In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Path-conditioned training: a principled way to rescale ReLU neural networks
Lebeurrier, Arthur, Vayer, Titouan, Gribonval, Rémi
Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.
- North America > United States > California (0.04)
- Europe > Germany (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada (0.04)
- (2 more...)