AITopics | Africa

Understanding the limitations of gradient methods, and stochastic gradient descent (SGD) in particular, is a central challenge in learning theory. To that end, a commonly used tool is the Statistical Queries (SQ) framework, which studies performance limits of algorithms based on noisy interaction with the data. However, it is known that the formal connection between the SQ framework and SGD is tenuous: Existing results typically rely on adversarial or specially-structured gradient noise that does not reflect the noise in standard SGD, and (as we point out here) can sometimes lead to incorrect predictions. Moreover, many analyses of SGD for challenging problems rely on non-trivial algorithmic modifications, such as restricting the SGD trajectory to the sphere or using very small learning rates. To address these shortcomings, we develop a new, non-SQ framework to study the limitations of standard vanilla SGD, for single-index and multi-index models (namely, when the target function depends on a low-dimensional projection of the inputs). Our results apply to a broad class of settings and architectures, including (potentially deep) neural networks.

artificial intelligence, machine learning, probability, (19 more...)

arXiv.org Machine Learning

2602.05704

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity

Luo, Hengrui, Ma, Anna, Stephan, Ludovic, Zhu, Yizhe

arXiv.org Machine LearningFeb-6-2026

Matrix completion studies the problem of reconstructing a matrix from a (typically random) subset of its entries by exploiting prior structural assumptions such as low rank and incoherence. Roughly speaking, when the underlying n n matrix has low rank and its eigenvectors are sufficiently incoherent, observing Ω(n log n) entries sampled uniformly at random is sufficient for exact recovery via efficient optimization methods [Keshavan et al., 2009, 2010, Candes and Tao, 2010, Candes and Plan, 2010, Recht, 2011, Candes and Recht, 2012, Jain et al., 2013]. This sample complexity is nearly optimal, since specifying a rank-r matrix requires only O(n) degrees of freedom. Tensor completion generalizes this problem to higher-order arrays, aiming to recover a low-rank tensor from a limited set of observed entries, for example, under uniform random sampling. As a natural higher-order analogue of matrix completion, tensor completion has found broad applications in areas such as recommendation systems [Frolov and Oseledets, 2017], signal and image processing [Govindu, 2005, Liu et al., 2012], and data science [Song et al., 2019]. Despite this close analogy, tensor completion behaves fundamentally differently from its matrix counterpart. In contrast to the classical matrix setting, tensor completion exhibits a pronounced trade-off between computational and statistical complexity: while information-theoretic considerations suggest that relatively few samples suffice for recovery, all currently known polynomial-time algorithms require substantially more observations than this optimal limit. Polynomial-time methods A widely used polynomial-time approach to tensor completion is to reduce the problem to matrix completion via matricization.

completion, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2602.05869

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.92)

Add feedback

Optimal scaling laws in learning hierarchical multi-index models

Defilippis, Leonardo, Krzakala, Florent, Loureiro, Bruno, Maillard, Antoine

arXiv.org Machine LearningFeb-6-2026

In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2602.05846

Country:

North America > United States (0.14)
Europe > France (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Scientists want you to smell ancient Egyptian mummies

Popular ScienceFeb-5-2026, 05:00:00 GMT

A mixture of archeology and chemistry brings the aroma of mummification to museums. Breakthroughs, discoveries, and DIY tips sent six days a week. Visiting a museum could soon be a truly multisensory experience--smells included. Thanks to recent advances in the field of biomolecular archeology, scientists can now detect traces of molecular fingerprints on ancient artifacts. From these tiny particles, scientists can determine how the objects may have smelled .

ancient egyptian mummy, artificial intelligence, service and privacy policy, (9 more...)

Popular Science

Country:

Africa > Middle East > Egypt (0.41)
Europe > Germany (0.31)

Genre: Research Report > New Finding (0.36)

Industry:

Leisure & Entertainment (0.49)
Media (0.30)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Principles of Lipschitz continuity in neural networks

Luo, Róisín

arXiv.org Machine LearningFeb-5-2026

Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advances, critical challenges remain -- most notably, ensuring robustness to small input perturbations and generalization to out-of-distribution data. These critical challenges underscore the need to understand the underlying fundamental principles that govern robustness and generalization. Among the theoretical tools available, Lipschitz continuity plays a pivotal role in governing the fundamental properties of neural networks related to robustness and generalization. It quantifies the worst-case sensitivity of network's outputs to small input perturbations. While its importance is widely acknowledged, prior research has predominantly focused on empirical regularization approaches based on Lipschitz constraints, leaving the underlying principles less explored. This thesis seeks to advance a principled understanding of the principles of Lipschitz continuity in neural networks within the paradigm of machine learning, examined from two complementary perspectives: an internal perspective -- focusing on the temporal evolution of Lipschitz continuity in neural networks during training (i.e., training dynamics); and an external perspective -- investigating how Lipschitz continuity modulates the behavior of neural networks with respect to features in the input data, particularly its role in governing frequency signal propagation (i.e., modulation of frequency signal propagation).

artificial intelligence, interpreting global perturbation robustness, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.13025/30167

2602.04078

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
(24 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Government > Regional Government (0.45)
Education > Educational Setting > Online (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

Mandal, Bibhabasu, Nandy, Sagnik

arXiv.org Machine LearningFeb-5-2026

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $β$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $β$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.

artificial intelligence, differential privacy, machine learning, (17 more...)

arXiv.org Machine Learning

2602.03948

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

The drones being used in Sudan: 1,000 attacks since April 2023

Al JazeeraFeb-3-2026, 06:24:33 GMT

During Sudan's civil war, which erupted in April 2023, both sides have increasingly relied on drones, and civilians have borne the brunt of the carnage. The conflict between the Sudanese armed forces (SAF) and the Rapid Support Forces (RSF) paramilitary group is an example of war transformed by commercially available, easily concealable unmanned aerial vehicles (UAVs), or drones. Modular, well-adapted to sanctions evasions and devastatingly effective, drones have killed scores of civilians, crippled infrastructure and plunged Sudanese cities into darkness. In this visual investigation, Al Jazeera examines the history of drone warfare in Sudan, the types of drones used by the warring sides, how they are sourced, where the attacks have occurred and the human toll. The RSF traces its origins to what at the time was a government-linked militia known as the Janjaweed.

artificial intelligence, drone, sudan, (17 more...)

Al Jazeera

Country:

Africa > Sudan (1.00)
Africa > Middle East > Libya (0.30)
Asia > Middle East > Yemen (0.29)

Industry:

Information Technology (1.00)
Government > Military > Army (0.70)
Government > Military > Air Force (0.47)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback

Predicting and improving test-time scaling laws via reward tail-guided search

Li, Muheng, Qian, Jian, Mou, Wenlong

arXiv.org Machine LearningFeb-3-2026

Test-time scaling has emerged as a critical avenue for enhancing the reasoning capabilities of Large Language Models (LLMs). Though the straight-forward ''best-of-$N$'' (BoN) strategy has already demonstrated significant improvements in performance, it lacks principled guidance on the choice of $N$, budget allocation, and multi-stage decision-making, thereby leaving substantial room for optimization. While many works have explored such optimization, rigorous theoretical guarantees remain limited. In this work, we propose new methodologies to predict and improve scaling properties via tail-guided search. By estimating the tail distribution of rewards, our method predicts the scaling law of LLMs without the need for exhaustive evaluations. Leveraging this prediction tool, we introduce Scaling-Law Guided (SLG) Search, a new test-time algorithm that dynamically allocates compute to identify and exploit intermediate states with the highest predicted potential. We theoretically prove that SLG achieves vanishing regret compared to perfect-information oracles, and achieves expected rewards that would otherwise require a polynomially larger compute budget required when using BoN. Empirically, we validate our framework across different LLMs and reward models, confirming that tail-guided allocation consistently achieves higher reward yields than Best-of-$N$ under identical compute budgets. Our code is available at https://github.com/PotatoJnny/Scaling-Law-Guided-search.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2602.01485

Country:

North America > Canada > Ontario > Toronto (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback