Goto

Collaborating Authors

 Africa


Fast Non-Episodic Finite-Horizon RL with K-Step Lookahead Thresholding

arXiv.org Machine Learning

Online reinforcement learning in non-episodic, finite-horizon MDPs remains underexplored and is challenged by the need to estimate returns to a fixed terminal time. Existing infinite-horizon methods, which often rely on discounted contraction, do not naturally account for this fixed-horizon structure. We introduce a modified Q-function: rather than targeting the full-horizon, we learn a K-step lookahead Q-function that truncates planning to the next K steps. To further improve sample efficiency, we introduce a thresholding mechanism: actions are selected only when their estimated K-step lookahead value exceeds a time-varying threshold. We provide an efficient tabular learning algorithm for this novel objective, proving it achieves fast finite-sample convergence: it achieves minimax optimal constant regret for $K=1$ and $\mathcal{O}(\max((K-1),C_{K-1})\sqrt{SAT\log(T)})$ regret for any $K \geq 2$. We numerically evaluate the performance of our algorithm under the objective of maximizing reward. Our implementation adaptively increases K over time, balancing lookahead depth against estimation variance. Empirical results demonstrate superior cumulative rewards over state-of-the-art tabular RL methods across synthetic MDPs and RL environments: JumpRiverswim, FrozenLake and AnyTrading.


Neuron Block Dynamics for XOR Classification with Zero-Margin

arXiv.org Machine Learning

The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where worst-case gradient bounds suffice. In contrast, we study zero-margin nonlinear classification by analyzing the Gaussian XOR problem, where inputs are Gaussian and the XOR decision boundary determines labels. In this setting, a non-negligible fraction of data lies arbitrarily close to the boundary, breaking standard margin-based arguments. Building on Glasgow's (2024) analysis, we extend the study of training dynamics from discrete to Gaussian inputs and develop a framework for the dynamics of neuron blocks. We show that neurons cluster into four directions and that block-level signals evolve coherently, a phenomenon essential in the Gaussian setting where individual neuron signals vary significantly. Leveraging this block perspective, we analyze generalization without relying on margin assumptions, adopting an average-case view that distinguishes regions of reliable prediction from regions of persistent error. Numerical experiments confirm the predicted two-phase block dynamics and demonstrate their robustness beyond the Gaussian setting.


Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

arXiv.org Machine Learning

It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of full-batch gradient descent (GD, which always reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) remains unclear. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.


Locust swarms may meet their match in protein-enriched crops

Popular Science

The specialized crops could save farmers millions. A swarm of desert locusts fly after an aircraft sprayed pesticide in Meru, Kenya in 2021. Breakthroughs, discoveries, and DIY tips sent six days a week. Swarms of locusts devouring a farmer's livelihood might sound apocalyptic, but major locust infestations are a regular problem in agricultural communities around the world. These locust swarms--dense, droning packs of certain grasshopper species--can cover hundreds of square miles, and the insects consume vast amounts of vegetation and threaten global agriculture.


Three West African juntas have turned to Russia. Now the US wants to engage them

BBC News

Three West African juntas have turned to Russia. The US has declared a stark policy shift towards three West African countries which are battling Islamist insurgents and whose military governments have broken defence ties with France and turned towards Russia. The state department announced that Nick Checker, head of its Bureau of African Affairs, would visit Mali's capital Bamako to convey the United States' respect for Mali's sovereignty and chart a new course in relations, moving past policy missteps. It adds that the US also looks forward to co-operating with Mali's allies, neighbouring Burkina Faso and Niger, on shared security and economic interests. Absent from the agenda is the longstanding American concern for democracy and human rights.


GRANITE: A Generalized Regional Framework for Identifying Agreement in Feature-Based Explanations

arXiv.org Machine Learning

Feature-based explanation methods aim to quantify how features influence the model's behavior, either locally or globally, but different methods often disagree, producing conflicting explanations. This disagreement arises primarily from two sources: how feature interactions are handled and how feature dependencies are incorporated. We propose GRANITE, a generalized regional explanation framework that partitions the feature space into regions where interaction and distribution influences are minimized. This approach aligns different explanation methods, yielding more consistent and interpretable explanations. GRANITE unifies existing regional approaches, extends them to feature groups, and introduces a recursive partitioning algorithm to estimate such regions. We demonstrate its effectiveness on real-world datasets, providing a practical tool for consistent and interpretable feature explanations.


A Random Matrix Theory of Masked Self-Supervised Regression

arXiv.org Machine Learning

Self-supervised learning (SSL) -- a training paradigm in which models learn useful representations from unlabeled data by exploiting the data itself as a source of supervision -- has emerged as a foundational component of the recent success of transformer architectures. By avoiding the need for manual annotations, SSL retains many of the benefits traditionally associated with supervised learning while avoiding reliance on labeled data. Consequently, SSL is widely adopted as a pretraining paradigm for learning general-purpose representations that substantially accelerate the optimization of downstream tasks, especially in data-scarce settings. A canonical example of a self-supervised learning task is masked language modeling (MLM), in which a neural network is trained to predict masked tokens in text using the remaining tokens as contextual information (Devlin et al., 2019a; Howard and Ruder, 2018; Radford et al., 2018; Brown et al., 2020; OpenAI, 2024). For example, given the sentence "The capital of France is Paris", a typical MLM task would be to teach the model to infer that we are speaking about the capital of a country from the context "France" and "Paris" from the masked sentence "The [MASK] of France is Paris".


Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

arXiv.org Machine Learning

Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.


Drone strikes in Ethiopia's Tigray kill one amid fears of renewed conflict

Al Jazeera

Drone strikes in Ethiopia's Tigray kill one amid fears of renewed conflict One person has been killed and another injured in drone strikes in Ethiopia's northern Tigray region, a senior Tigrayan official and a humanitarian worker said, in another sign of renewed conflict between regional and federal forces. The Tigrayan official on Saturday said the drone strikes hit two Isuzu trucks near Enticho and Gendebta, two places in Tigray about 20km (12 miles) apart. A local humanitarian worker confirmed the strikes had happened. Both asked not to be named, the Reuters news agency reported. It was not immediately clear what the trucks were carrying.


Secret warehouse guards lost world of treasures found on HS2 route

BBC News

Treasures unearthed by hundreds of archaeologists so far during work on the controversial planned HS2 train line have been shown exclusively to the BBC. The 450,000 objects, which are being held in a secret warehouse, include a possible Roman gladiator's tag, a hand axe that may be more than 40,000 years old and 19th Century gold dentures. It is an unprecedented amount and array of items, which will yield new insights into Britain's past, says the Centre for British Archaeology. Major building developments in the UK need land to be assessed by archaeologists as part of the planning process, to protect heritage sites. Since 2018 around 1,000 archaeologists have been involved in 60 digs along the route HS2 is set to take between London to Birmingham.