regime
Average-case hardness of RIP certification
The restricted isometry property (RIP) for design matrices gives guarantees for optimal recovery in sparse linear models. It is of high interest in compressed sensing and statistical learning. This property is particularly important for computationally efficient recovery methods. As a consequence, even though it is in general NP-hard to check that RIP holds, there have been substantial efforts to find tractable proxies for it. These would allow the construction of RIP matrices and the polynomial-time verification of RIP given an arbitrary matrix. We consider the framework of average-case certifiers, that never wrongly declare that a matrix is RIP, while being often correct for random instances. While there are such functions which are tractable in a suboptimal parameter regime, we show that this is a computationally hard task in any better regime. Our results are based on a new, weaker assumption on the problem of detecting dense subgraphs.
Without-Replacement Sampling for Stochastic Gradient Methods
Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled replacement. In contrast, sampling replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.
Deep State Space Models for Time Series Forecasting
We present a novel approach to probabilistic time series forecasting that combines state space models with deep learning. By parametrizing a per-time-series linear state space model with a jointly-learned recurrent neural network, our method retains desired properties of state space models such as data efficiency and interpretability, while making use of the ability to learn complex patterns from raw data offered by deep learning approaches. Our method scales gracefully from regimes where little training data is available to regimes where data from millions of time series can be leveraged to learn accurate models. We provide qualitative as well as quantitative results with the proposed method, showing that it compares favorably to the state-of-the-art.
Random Forests as Statistical Procedures: Design, Variance, and Dependence
We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte Carlo error from a covariance floor that persists under infinite aggregation. The floor arises through two mechanisms: observation reuse, where the same training outcomes receive weight across multiple trees, and partition alignment, where independently generated trees discover similar conditional prediction rules. We prove the floor is strictly positive under minimal conditions and show that alignment persists even when sample splitting eliminates observation overlap entirely. We introduce procedure-aligned synthetic resampling (PASR) to estimate the covariance floor, decomposing the total prediction uncertainty of a deployed forest into interpretable components. For continuous outcomes, resulting prediction intervals achieve nominal coverage with a theoretically guaranteed conservative bias direction. For classification forests, the PASR estimator is asymptotically unbiased, providing the first pointwise confidence intervals for predicted conditional probabilities from a deployed forest. Nominal coverage is maintained across a range of design configurations for both outcome types, including high-dimensional settings. The underlying theory extends to any tree-based ensemble with an exchangeable tree-generating mechanism.
- North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
Low-degree Lower bounds for clustering in moderate dimension
Carpentier, Alexandra, Verzelen, Nicolas
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.
- North America > United States (0.14)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- (3 more...)
Maximum entropy based testing in network models: ERGMs and constrained optimization
Ghosh, Subhrosekhar, Karmakar, Rathindra Nath, Lahiry, Samriddha
Stochastic network models play a central role across a wide range of scientific disciplines, and questions of statistical inference arise naturally in this context. In this paper we investigate goodness-of-fit and two-sample testing procedures for statistical networks based on the principle of maximum entropy (MaxEnt). Our approach formulates a constrained entropy-maximization problem on the space of networks, subject to prescribed structural constraints. The resulting test statistics are defined through the Lagrange multipliers associated with the constrained optimization problem, which, to our knowledge, is novel in the statistical networks literature. We establish consistency in the classical regime where the number of vertices is fixed. We then consider asymptotic regimes in which the graph size grows with the sample size, developing tests for both dense and sparse settings. In the dense case, we analyze exponential random graph models (ERGM) (including the Erdös-Rènyi models), while in the sparse regime our theory applies to Erd{ö}s-R{è}nyi graphs. Our analysis leverages recent advances in nonlinear large deviation theory for random graphs. We further show that the proposed Lagrange-multiplier framework connects naturally to classical score tests for constrained maximum likelihood estimation. The results provide a unified entropy-based framework for network model assessment across diverse growth regimes.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- North America > United States > New York (0.04)
- Research Report (0.64)
- Overview (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- (2 more...)
Path-conditioned training: a principled way to rescale ReLU neural networks
Lebeurrier, Arthur, Vayer, Titouan, Gribonval, Rémi
Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.
- North America > United States > California (0.04)
- Europe > Germany (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- North America > Canada (0.04)
- Asia > China (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
LearningGaussianMixtureswithGeneralisedLinear Models: PreciseAsymptoticsinHigh-dimensions
We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, wherewestudytheefficiencyof `1penaltywithrespectto `2;b)max-marginmulticlass classification, where we characterise the phase transition on the existence ofthemulti-class logistic maximum likelihood estimator forK >2.
- North America > United States (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.05)