Tempo Adaptation in Non-stationary Reinforcement Learning

Oct-10-2024, 03:50:31 GMT–Neural Information Processing Systems

We first raise and tackle a time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ( t) rather than episode progress ( k), where wall-clock time signifies the actual elapsed time within the fixed duration t \in [0, T] . In existing works, at episode k, the agent rolls a trajectory and trains a policy before transitioning to episode k 1 . In the context of the time-desynchronized environment, however, the agent at time t_{k} allocates \Delta t for trajectory generation and training, subsequently moves to the next episode at t_{k 1} t_{k} \Delta t . Despite a fixed total number of episodes ( K), the agent accumulates different trajectories influenced by the choice of interaction times ( t_1,t_2,...,t_K), significantly impacting the suboptimality gap of the policy.

non-stationary reinforcement learning, suboptimal, tempo adaptation, (4 more...)

Neural Information Processing Systems

Oct-10-2024, 03:50:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)