Goto

Collaborating Authors

 nonstationarity


Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

Neural Information Processing Systems

Massively parallel GPU simulation environments have accelerated reinforcement learning (RL) research by enabling fast data collection for on-policy RL algorithms like Proximal Policy Optimization (PPO). To maximize throughput, it is common to use short rollouts per policy update, increasing the update-to-data (UTD) ratio. However, we find that, in this setting, standard synchronous resets introduce harmful nonstationarity, skewing the learning signal and destabilizing training. We introduce staggered resets, a simple yet effective technique where environments are initialized and reset at varied points within the task horizon. This yields training batches with greater temporal diversity, reducing the nonstationarity induced by synchronized rollouts. We characterize dimensions along which RL environments can benefit significantly from staggered resets through illustrative toy environments. We then apply this technique to challenging high-dimensional robotics environments, achieving significantly higher sample efficiency, faster wall-clock convergence, and stronger final performance. Finally, this technique scales better with more parallel environments compared to naive synchronized rollouts.


SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding

Neural Information Processing Systems

Intracortical Brain-Computer Interfaces (iBCI) decode behavior from neural population activity to restore motor functions and communication abilities in individuals with motor impairments. A central challenge for long-term iBCI deployment is the nonstationarity of neural recordings, where the composition and tuning profiles of the recorded populations are unstable across recording sessions. Existing approaches attempt to address this issue by explicit alignment techniques; however, they rely on fixed neural identities and require test-time labels or parameter updates, limiting their generalization across sessions and imposing additional computational burden during deployment. In this work, we address the problem of cross-session nonstationarity in long-term iBCI systems and introduce SPINT - a Spatial Permutation-Invariant Neural Transformer framework for behavioral decoding that operates directly on unordered sets of neural units. Central to our approach is a novel context-dependent positional embedding scheme that dynamically infers unit-specific identities, enabling flexible generalization across recording sessions. SPINT supports inference on variable-size populations and allows few-shot, gradient-free adaptation using a small amount of unlabeled data from the test session. We evaluate SPINT on three multi-session datasets from the FALCON Benchmark, covering continuous motor decoding tasks in human and non-human primates. SPINT demonstrates robust cross-session generalization, outperforming existing zero-shot and few-shot unsupervised baselines while eliminating the need for test-time alignment and fine-tuning. Our work contributes an initial step toward a robust and scalable neural decoding framework for long-term iBCI applications.







Fast Rates for Nonstationary Weighted Risk Minimization

arXiv.org Machine Learning

Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess risk into a learning term and an error term associated with distribution drift, and prove oracle inequalities for the learning error under mixing conditions. The learning bound holds uniformly over arbitrary weight classes and accounts for the effective sample size induced by the weight vector, the complexity of the weight and hypothesis classes, and potential data dependence. We illustrate the applicability and sharpness of our results in (auto-) regression problems with linear models, basis approximations, and neural networks, recovering minimax-optimal rates (up to logarithmic factors) when specialized to unweighted and stationary settings.


Time-uniform confidence bands for the CDF under nonstationarity

Neural Information Processing Systems

Estimation of a complete univariate distribution from a sequence of observations is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d.


Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Neural Information Processing Systems

Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL,and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.