Africa
Limitations of SGD for Multi-Index Models Beyond Statistical Queries
Barzilai, Daniel, Shamir, Ohad
Understanding the limitations of gradient methods, and stochastic gradient descent (SGD) in particular, is a central challenge in learning theory. To that end, a commonly used tool is the Statistical Queries (SQ) framework, which studies performance limits of algorithms based on noisy interaction with the data. However, it is known that the formal connection between the SQ framework and SGD is tenuous: Existing results typically rely on adversarial or specially-structured gradient noise that does not reflect the noise in standard SGD, and (as we point out here) can sometimes lead to incorrect predictions. Moreover, many analyses of SGD for challenging problems rely on non-trivial algorithmic modifications, such as restricting the SGD trajectory to the sphere or using very small learning rates. To address these shortcomings, we develop a new, non-SQ framework to study the limitations of standard vanilla SGD, for single-index and multi-index models (namely, when the target function depends on a low-dimensional projection of the inputs). Our results apply to a broad class of settings and architectures, including (potentially deep) neural networks.
Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity
Luo, Hengrui, Ma, Anna, Stephan, Ludovic, Zhu, Yizhe
Matrix completion studies the problem of reconstructing a matrix from a (typically random) subset of its entries by exploiting prior structural assumptions such as low rank and incoherence. Roughly speaking, when the underlying n n matrix has low rank and its eigenvectors are sufficiently incoherent, observing โฆ(n log n) entries sampled uniformly at random is sufficient for exact recovery via efficient optimization methods [Keshavan et al., 2009, 2010, Candes and Tao, 2010, Candes and Plan, 2010, Recht, 2011, Candes and Recht, 2012, Jain et al., 2013]. This sample complexity is nearly optimal, since specifying a rank-r matrix requires only O(n) degrees of freedom. Tensor completion generalizes this problem to higher-order arrays, aiming to recover a low-rank tensor from a limited set of observed entries, for example, under uniform random sampling. As a natural higher-order analogue of matrix completion, tensor completion has found broad applications in areas such as recommendation systems [Frolov and Oseledets, 2017], signal and image processing [Govindu, 2005, Liu et al., 2012], and data science [Song et al., 2019]. Despite this close analogy, tensor completion behaves fundamentally differently from its matrix counterpart. In contrast to the classical matrix setting, tensor completion exhibits a pronounced trade-off between computational and statistical complexity: while information-theoretic considerations suggest that relatively few samples suffice for recovery, all currently known polynomial-time algorithms require substantially more observations than this optimal limit. Polynomial-time methods A widely used polynomial-time approach to tensor completion is to reduce the problem to matrix completion via matricization.
Optimal scaling laws in learning hierarchical multi-index models
Defilippis, Leonardo, Krzakala, Florent, Loureiro, Bruno, Maillard, Antoine
In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.
Scientists want you to smell ancient Egyptian mummies
A mixture of archeology and chemistry brings the aroma of mummification to museums. Breakthroughs, discoveries, and DIY tips sent six days a week. Visiting a museum could soon be a truly multisensory experience--smells included. Thanks to recent advances in the field of biomolecular archeology, scientists can now detect traces of molecular fingerprints on ancient artifacts. From these tiny particles, scientists can determine how the objects may have smelled .
Principles of Lipschitz continuity in neural networks
Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advances, critical challenges remain -- most notably, ensuring robustness to small input perturbations and generalization to out-of-distribution data. These critical challenges underscore the need to understand the underlying fundamental principles that govern robustness and generalization. Among the theoretical tools available, Lipschitz continuity plays a pivotal role in governing the fundamental properties of neural networks related to robustness and generalization. It quantifies the worst-case sensitivity of network's outputs to small input perturbations. While its importance is widely acknowledged, prior research has predominantly focused on empirical regularization approaches based on Lipschitz constraints, leaving the underlying principles less explored. This thesis seeks to advance a principled understanding of the principles of Lipschitz continuity in neural networks within the paradigm of machine learning, examined from two complementary perspectives: an internal perspective -- focusing on the temporal evolution of Lipschitz continuity in neural networks during training (i.e., training dynamics); and an external perspective -- investigating how Lipschitz continuity modulates the behavior of neural networks with respect to features in the input data, particularly its role in governing frequency signal propagation (i.e., modulation of frequency signal propagation).
Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks
Mandal, Bibhabasu, Nandy, Sagnik
In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $ฮฒ$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $ฮฒ$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.
The drones being used in Sudan: 1,000 attacks since April 2023
During Sudan's civil war, which erupted in April 2023, both sides have increasingly relied on drones, and civilians have borne the brunt of the carnage. The conflict between the Sudanese armed forces (SAF) and the Rapid Support Forces (RSF) paramilitary group is an example of war transformed by commercially available, easily concealable unmanned aerial vehicles (UAVs), or drones. Modular, well-adapted to sanctions evasions and devastatingly effective, drones have killed scores of civilians, crippled infrastructure and plunged Sudanese cities into darkness. In this visual investigation, Al Jazeera examines the history of drone warfare in Sudan, the types of drones used by the warring sides, how they are sourced, where the attacks have occurred and the human toll. The RSF traces its origins to what at the time was a government-linked militia known as the Janjaweed.
Predicting and improving test-time scaling laws via reward tail-guided search
Li, Muheng, Qian, Jian, Mou, Wenlong
Test-time scaling has emerged as a critical avenue for enhancing the reasoning capabilities of Large Language Models (LLMs). Though the straight-forward ''best-of-$N$'' (BoN) strategy has already demonstrated significant improvements in performance, it lacks principled guidance on the choice of $N$, budget allocation, and multi-stage decision-making, thereby leaving substantial room for optimization. While many works have explored such optimization, rigorous theoretical guarantees remain limited. In this work, we propose new methodologies to predict and improve scaling properties via tail-guided search. By estimating the tail distribution of rewards, our method predicts the scaling law of LLMs without the need for exhaustive evaluations. Leveraging this prediction tool, we introduce Scaling-Law Guided (SLG) Search, a new test-time algorithm that dynamically allocates compute to identify and exploit intermediate states with the highest predicted potential. We theoretically prove that SLG achieves vanishing regret compared to perfect-information oracles, and achieves expected rewards that would otherwise require a polynomially larger compute budget required when using BoN. Empirically, we validate our framework across different LLMs and reward models, confirming that tail-guided allocation consistently achieves higher reward yields than Best-of-$N$ under identical compute budgets. Our code is available at https://github.com/PotatoJnny/Scaling-Law-Guided-search.