Goto

Collaborating Authors

 non-stationary environment


Non-Stationary Functional Bilevel Optimization

Bohne, Jason, Petrulionyte, Ieva, Arbel, Michael, Mairal, Julien, Polak, Paweł

arXiv.org Machine Learning

Functional bilevel optimization (FBO) provides a powerful framework for hierarchical learning in function spaces, yet current methods are limited to static offline settings and perform suboptimally in online, non-stationary scenarios. We propose SmoothFBO, the first algorithm for non-stationary FBO with both theoretical guarantees and practical scalability. SmoothFBO introduces a time-smoothed stochastic hypergradient estimator that reduces variance through a window parameter, enabling stable outer-loop updates with sublinear regret. Importantly, the classical parametric bilevel case is a special reduction of our framework, making SmoothFBO a natural extension to online, non-stationary settings. Empirically, SmoothFBO consistently outperforms existing FBO methods in non-stationary hyperparameter optimization and model-based reinforcement learning, demonstrating its practical effectiveness. Together, these results establish SmoothFBO as a general, theoretically grounded, and practically viable foundation for bilevel optimization in online, non-stationary scenarios.


Test-time Adaptation in Non-stationary Environments via Adaptive Representation Alignment

Neural Information Processing Systems

Adapting to distribution shifts is a critical challenge in modern machine learning, especially as data in many real-world applications accumulate continuously in the form of streams. We investigate the problem of sequentially adapting a model to non-stationary environments, where the data distribution is continuously shifting and only a small amount of unlabeled data are available each time. Continual test-time adaptation methods have shown promising results by using reliable pseudo-labels, but they still fall short in exploring representation alignment with the source domain in non-stationary environments. In this paper, we propose to leverage non-stationary representation learning to adaptively align the unlabeled data stream, with its changing distributions, to the source data representation using a sketch of the source data. To alleviate the data scarcity in non-stationary representation learning, we propose a novel adaptive representation alignment algorithm called Ada-ReAlign. This approach employs a group of base learners to explore different lengths of the unlabeled data stream, which are adaptively combined by a meta learner to handle unknown and continuously evolving data distributions. The proposed method comes with nice theoretical guarantees under convexity assumptions. Experiments on both benchmark datasets and a real-world application validate the effectiveness and adaptability of our proposed algorithm.


Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Neural Information Processing Systems

We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback, employing *dynamic regret* as the performance measure. We start with in-depth analyses of the strengths and limitations of the two most popular methods: occupancy-measure-based and policy-based methods. We observe that while the occupancy-measure-based method is effective in addressing non-stationary environments, it encounters difficulties with the unknown transition. In contrast, the policy-based method can deal with the unknown transition effectively but faces challenges in handling non-stationary environments. Building on this, we propose a novel algorithm that combines the benefits of both methods. Specifically, it employs (i) an *occupancy-measure-based global optimization* with a two-layer structure to handle non-stationary environments; and (ii) a *policy-based variance-aware value-targeted regression* to tackle the unknown transition.


Non-stationary Bandits with Knapsacks

Neural Information Processing Systems

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment.


Dynamic Regret of Policy Optimization in Non-Stationary Environments

Neural Information Processing Systems

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels. We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. Compared with the classical notion of static regret, dynamic regret is a stronger notion as it explicitly accounts for the non-stationarity of environments. The dynamic regret attained by the proposed algorithms interpolates between different regimes of non-stationarity, and moreover satisfies a notion of adaptive (near-)optimality, in the sense that it matches the (near-)optimal static regret under slow-changing environments. The dynamic regret bound features two components, one arising from exploration, which deals with the uncertainty of transition kernels, and the other arising from adaptation, which deals with non-stationary environments. Specifically, we show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.


Evolving Graph Learning for Out-of-Distribution Generalization in Non-stationary Environments

Sun, Qingyun, Luo, Jiayi, Yuan, Haonan, Fu, Xingcheng, Peng, Hao, Li, Jianxin, Yu, Philip S.

arXiv.org Artificial Intelligence

Graph neural networks have shown remarkable success in exploiting the spatial and temporal patterns on dynamic graphs. However, existing GNNs exhibit poor generalization ability under distribution shifts, which is inevitable in dynamic scenarios. As dynamic graph generation progresses amid evolving latent non-stationary environments, it is imperative to explore their effects on out-of-distribution (OOD) generalization. This paper proposes a novel Evolving Graph Learning framework for OOD generalization (EvoOOD) by environment-aware invariant pattern recognition. Specifically, we first design an environment sequential variational auto-encoder to model environment evolution and infer the underlying environment distribution. Then, we introduce a mechanism for environment-aware invariant pattern recognition, tailored to address environmental diversification through inferred distributions. Finally, we conduct fine-grained causal interventions on individual nodes using a mixture of instantiated environment samples. This approach helps to distinguish spatio-temporal invariant patterns for OOD prediction, especially in non-stationary environments. Experimental results demonstrate the superiority of EvoGOOD on both real-world and synthetic dynamic datasets under distribution shifts. To the best of our knowledge, it is the first attempt to study the dynamic graph OOD generalization problem from the environment evolution perspective.